Plant reproduction polynucleotides and methods of use

ABSTRACT

This invention relates to an isolated nucleic acid fragment encoding a reproduction protein. The invention also relates to the construction of a chimeric gene encoding all or a portion of the reproduction protein, in sense or antisense orientation, wherein expression of the chimeric gene results in production of altered levels of the reproduction protein in a transformed host cell. The invention also provides isolated transcriptional regulatory elements and polynucleotides associated therewith.

This application is a divisional of U.S. Non-provisional applicationSer. No. 09/967,552, filed Jun. 7, 2004, which is a Continuation of Ser.No. 09/967,552, filed Sep. 28, 2001, which is a Continuation-in-part ofinternational application PCT/US00/23735 filed 30 Aug. 2000, whichclaims priority to U.S. Provisional Application No. 60/151,575 filed 31Aug. 1999, all of which are incorporated by reference.

FIELD OF THE INVENTION

This invention is in the field of plant molecular biology. Morespecifically, this invention pertains to nucleic acid fragments encodingproteins involved in endosperm and embryo development in plant seeds.

BACKGROUND OF THE INVENTION

Reproduction in flowering plants involves two fertilization events. Asperm fuses with the egg cell to form a zygote which becomes the embryo;a second sperm cell fuses with the doubled-haploid central cell nucleusto form the starting point of the triploid endosperm tissue. Whilefertilization is thus normally the trigger for seed development, mutantshave been identified in which reproductive processes are initiatedindependent of fertilization. Such mutations uncouple components of seeddevelopment from the fertilization process, resulting in developmentalpatterns resembling those found in apomictic plants.

Arabidopsis fie mutants (for fertilization-independent endosperm)isolated by Ohad et al. (Proc. Natl. Acad. Sci. USA 93:5319-5324, 1996;see also U.S. Pat. No. 6,229,064) exhibit replication of the centralcell nucleus, initiating endosperm development, in the absence offertilization. Inheritance of the mutant fie allele by the femalegametophyte results in embryo abortion; thus, the trait can betransmitted to progeny only by the male gametophyte. The Arabidopsis FIEgene was cloned (Ohad et al., The Plant Cell 11:407-416 (1999); GenBankentry AF129516) and found to encode a polypeptide related to the WDPolycomb group proteins encoded by, for example, Esc in Drosophila(Gutjahr et al., EMBO J. 14:4296-4306 (1995); Sathe and Harte, Mech.Dev. 52:77-87 (1995); Jones and Gelbart, Mol. Cell. Biol. 13:6357-6366(1993). WD polycomb proteins may interact with other polynucleotides toform complexes which interfere with gene transcription (Pirrotta, Cell93:333-336 (1998). Fertilization may trigger alteration of the proteincomplexes, allowing transcription of genes involved in endospermdevelopment. Thus, loss-of-function fie mutants would lack the abilityto form the protein complexes which repress transcription, and endospermdevelopment could proceed independent of fertilization (Ohad et al.1999, supra).

Chaudhury et al. (Proc. Natl. Acad. Sci. USA 94:4223 (1997)) reportedfis (fertilization-independent seed) mutants in Arabidopsis. In fis1 andfis2 seed, the endosperm develops to the point of cellularization beforeatrophying. Proembryos are formed in a low proportion of seeds but donot develop beyond the globular stage. The FIS1 and FIS2 genes werecloned and further characterized. The FIS2 gene comprised structuressuggesting function as a transcription factor; the FIS1 gene was foundto be allelic (Proc. Natl. Acad. Sci. USA 96:296 (1999)) to theArabidopsis gene MEDEA (Grossniklaus et al. Science 280:446 (1998)).

Apomixis (asexual reproduction) may occur through vegetativereproduction or through agamospermy, the formation of seeds withoutfertilization. Generally, agamospermy has not been exploited inagriculture; however, it has numerous potential applications, includingperpetuation of high yielding crop plant hybrids and varieties, andmaintenance of pure inbred lines. Also, seed formation withoutfertilization avoids factors that can reduce the efficiency of seed set,such as pollen count and pollen viability, and stigma or antheremergence or viability. Agamospermy would also allow the immediatestable incorporation of transgenes without the need for selfing toproduce homozygotes. In addition, the fertilization-independentendosperm gene and other related genes could be used to cause theformation of a fertilization-independent endosperm without necessarilyforming a viable embryo. Such a seed would not germinate because itlacks an embryo. However, the endosperm, if sufficiently formed, couldbe used for human and animal food and for commercial milling andextraction. Such embryo-less seeds would have the added advantage ofallowing containment of genetically modified organisms to satisfyenvironmental and regulatory concerns. Such seeds could also beindependently modified to produce novel products in the endosperm suchas pharmaceuticals, nutraceuticals, and industrial compounds andpolymers.

Identification of specific genes involved in agamospermy, such asfertilization-independent endosperm genes, will offer new ways ofproducing apomictic plants. Such approaches may involve selectivemutagenesis of fertilization-independent endosperm genes and thentracking of the mutant alleles in a molecular breeding program, ortransgenic methods. Accordingly, identification and isolation of nucleicacid sequences encoding all or a portion of a protein affecting seeddevelopment independent of fertilization would facilitate studies ofdevelopmental regulation in plants and provide genetic tools to engineerapomixis.

SUMMARY OF THE INVENTION

The present invention concerns an isolated polynucleotide comprising anucleotide sequence selected from the group consisting of: (a) a firstnucleotide sequence encoding a functionalfertilization-independent-endosperm (FIE) polypeptide having at least80% identity, based on the GAP (GCG Version 10) method of alignment, toa polypeptide selected from the group consisting of SEQ ID NOS: 2, 4, 6,8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70.

In a second embodiment, it is preferred that the isolated polynucleotideof the claimed invention comprise a nucleic acid sequence selected fromthe group consisting of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55,57, 59, 61, 63, 65, 67 and 69 that codes for the polypeptide selectedfrom the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18,20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54,56, 58, 60, 62, 64, 66, 68 and 70.

In a third embodiment, this invention concerns an isolatedpolynucleotide comprising a nucleotide sequence of at least about 30contiguous nucleotides derived from a nucleotide sequence selected fromthe group consisting of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19,21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55,57, 59, 61, 63, 65, 67, 69, 71, 72, and the complement of each suchnucleotide sequence.

In a fourth embodiment, this invention relates to a chimeric genecomprising an isolated polynucleotide of the present invention operablylinked to at least one suitable regulatory sequence.

In a fifth embodiment, the present invention concerns an isolated hostcell comprising a chimeric gene of the present invention or an isolatedpolynucleotide of the present invention. The host cell may beeukaryotic, such as a plant cell, or prokaryotic, such as a bacterialcell. The present invention also relates to a virus, preferably abaculovirus, comprising an isolated polynucleotide of the presentinvention or a chimeric gene of the present invention.

In a sixth embodiment, the invention also relates to a process forproducing an isolated host cell comprising a chimeric gene of thepresent invention or an isolated polynucleotide of the presentinvention, the process comprising either transforming or transfecting anisolated compatible host cell with a chimeric gene or isolatedpolynucleotide of the present invention.

In a seventh embodiment, the invention concerns afertilization-independent endosperm polypeptide at least 80% identical,based on the GAP (GCG Version 10) method of alignment, to a polypeptideselected from the group consisting of SEQ ID NOS: 2, 4, 6, 8, 10, 12,14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48,50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70.

In an eighth embodiment, the invention relates to a method of selectingan isolated polynucleotide that affects the level of expression of afertilization-independent endosperm polypeptide or enzyme activity in ahost cell, preferably a plant cell, the method comprising the steps of:(a) constructing an isolated polynucleotide of the present invention oran isolated chimeric gene of the present invention; (b) introducing theisolated polynucleotide or the isolated chimeric gene into a host cell;(c) measuring the level of the fertilization-independent endospermpolypeptide or enzyme activity in the host cell containing the isolatedpolynucleotide; and (d) comparing the level of thefertilization-independent endosperm polypeptide or enzyme activity inthe host cell containing the isolated polynucleotide with the level ofthe fertilization-independent endosperm polypeptide or enzyme activityin a host cell that does not contain the isolated polynucleotide.

In a ninth embodiment, the invention concerns a method of obtaining anucleic acid fragment encoding a substantial portion of afertilization-independent endosperm polypeptide, preferably a plantfertilization-independent endosperm polypeptide, comprising the stepsof: (a) synthesizing an oligonucleotide primer comprising a nucleotidesequence of at least 30 contiguous nucleotides derived from a nucleotidesequence selected from the group consisting of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43,45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 72, and thecomplement of each such nucleotide sequence; and (b) amplifying anucleic acid fragment (preferably a cDNA inserted in a cloning vector)using the oligonucleotide primer. The amplified nucleic acid fragmentpreferably will encode a substantial portion of afertilization-independent polypeptide.

In a tenth embodiment, this invention relates to a method of obtaining anucleic acid fragment encoding all or a substantial portion of the aminoacid sequence comprising a fertilization-independent endospermpolypeptide, such method comprising the steps of: (a) probing a cDNA orgenomic library with an isolated polynucleotide of the presentinvention; (b) identifying a DNA clone that hybridizes with an isolatedpolynucleotide of the present invention; (c) isolating the identifiedDNA clone; and (d) sequencing the cDNA or genomic fragment thatcomprises the isolated DNA clone.

In an eleventh embodiment, this invention concerns a composition, suchas a hybridization mixture, comprising an isolated polynucleotide of thepresent invention.

In a twelfth embodiment, this invention concerns a method for positiveselection of a transformed cell comprising: (a) transforming a host cellwith the chimeric gene of the present invention or an expressioncassette of the present invention; (b) growing the transformed hostcell, preferably a plant cell, such as a monocot or a dicot, underconditions which allow expression of the fertilization-independentendosperm polynucleotide in an amount sufficient to complement a nullmutant to provide a positive selection means.

In a thirteenth embodiment, this invention relates to a method ofaltering the level of expression of an fie protein in a host cellcomprising: (a) transforming a host cell with a chimeric gene of thepresent invention; and (b) growing the transformed host cell underconditions that are suitable for expression of the chimeric gene whereinexpression of the chimeric gene results in altered levels of the fieprotein in the transformed host cell. The fie protein may act insuppressing transcription of genes involved in endosperm formation.

A fourteenth embodiment relates to an isolated chromosomalpolynucleotide of the claimed invention which comprises a firstnucleotide sequence selected from the group consisting of SEQ ID NOS:71and 72.

A fifteenth embodiment relates to regulatory sequences associated withZea mays fie polynucleotides comprising SEQ ID NOS: 73 and 74.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 shows the pattern of direct repeats in the ZmFIE-B 5′ upstreamregion.

BRIEF DESCRIPTION OF THE SEQUENCE LISTING

The invention can be more fully understood from the following detaileddescription and the accompanying Sequence Listing which form a part ofthis application.

Table 1 lists the polynucleotides and polypeptides that are describedherein, the designation of the cDNA clones and chromosomal sequencesthat comprise the nucleic acid fragments encoding all or a substantialportion of these polypeptides, and the corresponding identifier (SEQ IDNO:) as used in the attached Sequence Listing. The sequence descriptionsand Sequence Listing attached hereto comply with the rules governingnucleotide and/or amino acid sequence disclosures in patent applicationsas set forth in 37 C.F.R. §1.821-1.825. The Sequence Listing containsthe one-letter code for nucleotide sequence characters and thethree-letter codes for amino acids as defined in conformity with theIUPAC-IUBMB standards described in Nucleic Acids Res. 13:3021-3030(1985) and in Biochemical J. 219 (No. 2):345-373 (1984), which areherein incorporated by reference. TABLE 1 Reproduction Proteins andPolynucleotides SEQ ID NO: Clone (Nucleo- (Amino Protein Designationtide) Acid) Fertilization-independent ccase-b.pk0026.g4 1 2 endospermprotein (CGS) Fertilization-independent cen1.mn0001.g10 3 4 endospermprotein (CGS) Fertilization-independent cen3n.pk0076.b8 5 6 endospermprotein (CGS) Fertilization-independent cpb1c.pk001.d10 7 8 endospermprotein (FIS) Fertilization-independent eec1c.pk003.e23 9 10 endospermprotein (CGS) Fertilization-independent hlp1c.pk003.e8 11 12 endospermprotein (FIS) Fertilization-independent ncs.pk0019.h3 13 14 endospermprotein (CGS) Fertilization-independent p0003.cgpfn34f 15 16 endospermprotein (EST) Fertilization-independent p0003.cgped29rb 17 18 endospermprotein (CGS) Fertilization-independent p0037.crwao47r 19 20 endospermprotein (FIS) Fertilization-independent p0041.crtaw93r 21 22 endospermprotein (FIS) Fertilization-independent p0101.cgamg48r 23 24 endospermprotein (CGS) Fertilization-independent p0104.cabbn62r 25 26 endospermprotein (CGS) Fertilization-independent p0107.cbcai79r 27 28 endospermprotein (CGS) Fertilization-independent p0119.cmtoh49r 29 30 endospermprotein (CGS) Fertilization-independent p0120.cdebd48r 31 32 endospermprotein (FIS) Fertilization-independent rcal1c.pk0001.d2 33 34 endospermprotein (CGS) Fertilization-independent ses2w.pk0015.b10 35 36 endospermprotein (CGS) Fertilization-independent wkm1c.pk0003.f4 37 38 endospermprotein (CGS) Fertilization-independent ccase-b.pk0026.g4 39 40endosperm protein (EST) Fertilization-independent cen1.mn0001.g10 41 42endosperm protein (EST) Fertilization-independent cpb1c.pk001.d10 43 44endosperm protein (EST) Fertilization-independent eec1c.pk003.e23 45 46endosperm protein (EST) Fertilization-independent hlp1c.pk003.e8 47 48endosperm protein (EST) Fertilization-independent ncs.pk0019.h3 49 50endosperm protein (EST) Fertilization-independent p0003.cgpfn34rb 51 52endosperm protein (EST) Fertilization-independent p0003.cgped29rb 53 54endosperm protein (EST) Fertilization-independent p0037.crwao47r 55 56endosperm protein (EST) Fertilization-independent p0041.crtaw93r 57 58endosperm protein (EST) Fertilization-independent p0104.cabbn62r 59 60endosperm protein (EST) Fertilization-independent p0107.cbcai79r 61 62endosperm protein (CGS) Fertilization-independent p0120.cdebd48r 63 64endosperm protein (EST) Fertilization-independent rcal1c.pk0001.d2 65 66endosperm protein (EST) Fertilization-independent ses2w.pk0015.b10 67 68endosperm protein (EST) Fertilization-independent wkm1c.pk0003.f4 69 70endosperm protein (EST) Fertilization-independent Genomic Sequence 71endosperm protein for ZmFIE-B Fertilization-independent Genomic Sequence72 endosperm protein for ZmFIE-A 5′ non-coding region Genomic 5′ up- 73stream sequence of ZmFIE-A 5′ non-coding region Genomic 5′ up- 74 streamsequence of ZmFIE-B ZmFIE-B partial genomic From B73 75 sequence Forwardprimer For Mo17 and B73 76 Reverse primer For B73 77 Reverse primer ForMo17 78 Primer Mu-specific 79 Primer Gene-specific 80 PrimerGene-specific 81 Primer Gene-specific 82

DETAILED DESCRIPTION OF THE INVENTION

In the context of this disclosure, a number of terms shall be utilized.The terms “polynucleotide”, “polynucleotide sequence”, “nucleic acidsequence”, “nucleic acid fragment” and “isolated nucleic acid fragment”are used interchangeably herein. These terms encompass nucleotidesequences and the like. A polynucleotide may be a polymer of RNA or DNAthat is single- or double-stranded, that optionally contains synthetic,non-natural or altered nucleotide bases. A polynucleotide in the form ofa polymer of DNA may be comprised of one or more segments of cDNA,genomic DNA, synthetic DNA, or mixtures thereof. An isolatedpolynucleotide of the present invention may include at least 60contiguous nucleotides, preferably at least 40 contiguous nucleotides,most preferably at least 30 contiguous nucleotides derived from SEQ IDNOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35,37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71,72, and the complement of each such sequence.

The term “isolated” polynucleotide refers to a polynucleotide that issubstantially free from other nucleic acid sequences, such as and notlimited to, other chromosomal and extrachromosomal DNA and RNA. Isolatedpolynucleotides may be purified from a host cell in which they naturallyoccur. Conventional nucleic acid purification methods known to skilledartisans may be used to obtain isolated polynucleotides. The term alsoembraces recombinant polynucleotides and chemically synthesizedpolynucleotides.

The term “recombinant” means, for example, that a nucleic acid sequenceis made by an artificial combination of two otherwise separated segmentsof sequence, e.g., by chemical synthesis or by the manipulation ofisolated nucleic acids by genetic engineering techniques.

As used herein, “contig” refers to a nucleotide sequence that isassembled from two or more constituent nucleotide sequences that sharecommon or overlapping regions of sequence homology. For example, thenucleotide sequences of two or more nucleic acid fragments can becompared and aligned in order to identify common or overlappingsequences. Where common or overlapping sequences exist between two ormore nucleic acid fragments, the sequences (and thus their correspondingnucleic acid fragments) can be assembled into a single contiguousnucleotide sequence.

As used herein, “substantially similar” refers to nucleic acid fragmentswherein changes in one or more nucleotide bases may result insubstitution of one or more amino acids, but do not affect thefunctional properties of the polypeptide encoded by the nucleotidesequence. “Substantially similar” also refers to nucleic acid fragmentswherein changes in one or more nucleotide bases do not affect theability of the nucleic acid fragment to mediate alteration of geneexpression through, for example, antisense or co-suppression technology,or through acting as a promoter. “Substantially similar” also refers tomodifications of the nucleic acid fragments of the instant invention,such as deletion or insertion of one or more nucleotides, that do notsubstantially affect the functional properties of the resultingtranscript (such as in the ability to mediate gene silencing) or do notresult in alteration of the functional properties of the resultingprotein molecule. It is therefore understood that the inventionencompasses more than the specific exemplary nucleotide or amino acidsequences and includes functional equivalents thereof. The terms“substantially similar” and “corresponding substantially” are usedinterchangeably herein.

Substantially similar nucleic acid fragments may be selected byscreening nucleic acid fragments, representing subfragments ormodifications of the nucleic acid fragments of the instant inventionwherein one or more nucleotides are substituted, deleted and/orinserted, for their ability to affect the level of the polypeptideencoded by the unmodified nucleic acid fragment (the “subjectpolypeptide”) in a plant or plant cell. For example, a substantiallysimilar nucleic acid fragment derived from the instant nucleic acidfragment can be constructed and introduced into a plant or plant cell.The level of the subject polypeptide in a plant or plant cell comprisingthe substantially similar nucleic fragment can then be compared to thelevel of the polypeptide in a plant or plant cell that does not comprisethe substantially similar nucleic acid fragment.

For example, it is well known in the art that antisense suppression andco-suppression of gene expression may be accomplished using nucleic acidfragments representing less than the entire coding region of a gene, andby using nucleic acid fragments that do not share 100% sequence identitywith the gene to be suppressed. Moreover, alterations at a given site ina nucleic acid fragment which result in the production of a chemicallyequivalent amino acid, but which do not affect the functional propertiesof the encoded polypeptide, are well known in the art. Thus, a codon forthe amino acid alanine, a hydrophobic amino acid, may be substituted bya codon encoding another less hydrophobic residue, such as glycine, or amore hydrophobic residue, such as valine, leucine, or isoleucine.Similarly, changes which result in substitution of onenegatively-charged residue for another, such as aspartic acid forglutamic acid, or one positively-charged residue for another, such aslysine for arginine, can also be expected to produce a functionallyequivalent product. Nucleotide changes which result in alteration of theN-terminal and C-terminal portions of the polypeptide molecule wouldalso not be expected to alter the activity of the polypeptide. Each ofthe proposed modifications is well within the routine skill in the art,as is determination of retention of biological activity of the encodedproducts. Consequently, an isolated polynucleotide comprising anucleotide sequence of at least 30 contiguous nucleotides, derived froma nucleotide sequence selected from the group consisting of SEQ ID NOS:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37,39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71 and72, may be used in methods of selecting an isolated polynucleotide thataffects the expression of a fertilization-independent endospermpolypeptide in a host cell. A method of selecting an isolatedpolynucleotide that affects the level of expression of a polypeptide ina virus or in a eukaryotic or prokaryotic host may comprise the stepsof: (a) constructing an isolated polynucleotide of the present inventionor an isolated chimeric gene of the present invention; (b) introducingthe isolated polynucleotide or the isolated chimeric gene into a hostcell; (c) measuring the level of a polypeptide or enzyme activity in thehost cell containing the isolated polynucleotide; and (d) comparing thelevel of a polypeptide or enzyme activity in the host cell comprisingthe isolated polynucleotide with the level of a polypeptide or enzymeactivity in a host cell that does not comprise the isolatedpolynucleotide.

Moreover, substantially similar nucleic acid fragments may also becharacterized by their ability to hybridize. Estimates of homology areprovided by either DNA-DNA or DNA-RNA hybridization under conditions ofstringency as is well understood by those skilled in the art (Hames andHiggins, Eds. (1985) Nucleic Acid Hybridisation, IRL Press, Oxford,U.K.). By “stringent conditions” or “stringent hybridization conditions”is intended conditions under which a probe will hybridize to its targetsequence to a detectably greater degree than to other sequences (e.g.,at least 2-fold over background). Stringency conditions can be adjustedto screen for moderately similar fragments, such as homologous sequencesfrom distantly related organisms, or to screen for highly similarfragments, such as genes that duplicate functional enzymes fromclosely-related organisms. Stringent conditions are sequence-dependentand will be different in different circumstances. By controlling thestringency of the hybridization and/or washing conditions, targetsequences that are 100% complementary to the probe can be identified(homologous probing). Alternatively, stringency conditions can beadjusted to allow some mismatching in sequences so that lower degrees ofidentity are detected (heterologous probing). Generally, a probe is lessthan about 1000 nucleotides in length, preferably less than 500nucleotides in length.

Typically, stringent conditions will be those in which the saltconcentration is less than about 1.5 M Na ion, typically about 0.01 to1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and thetemperature is at least about 30° C. for short probes (e.g., 10 to 50nucleotides) and at least about 60° C. for long probes (e.g., greaterthan 50 nucleotides). Stringent conditions may also be achieved with theaddition of destabilizing agents such as formamide. Exemplary lowstringency conditions include hybridization with a buffer solution of 30to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at 37° C.,and a wash in 1× to 2×SSC (20×SSC=3.0 M NaCl/0.3 M trisodium citrate) at50 to 55° C. Exemplary moderate stringency conditions includehybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37° C., anda wash in 0.5× to 1×SSC at 55 to 60° C. Exemplary high stringencyconditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at37° C., and a wash in 0.1×SSC at 60 to 65° C. Duration of hybridizationis generally less than about 24 hours, usually about 4 to about 12hours.

Alternatively, one set of preferred conditions uses a series of washesstarting with 6×SSC, 0.5% SDS at room temperature for 15 min, then with2×SSC, 0.5% SDS at 45° C. for 30 min, and then twice with 0.2×SSC, 0.5%SDS at 50° C. for 30 min. A more-preferred set of stringent conditionsuses washes identical to those above except that the temperature of thefinal two 30-minute washes is increased to 60° C. Another preferred setof highly stringent conditions uses two final washes in 0.1×SSC, 0.1%SDS at 65° C.

Specificity is typically the function of post-hybridization washes, thecritical factors being the ionic strength and temperature of the finalwash solution. For DNA-DNA hybrids, the T_(m) can be approximated fromthe equation of Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284:T_(m)=81.5° C.+16.6 (log M)+0.41 (% GC)−0.61 (% form)−500/L; where M isthe molarity of monovalent cations, % GC is the percentage of guanosineand cytosine nucleotides in the DNA, % form is the percentage offormamide in the hybridization solution, and L is the length of thehybrid in base pairs. The T_(m) is the temperature (under defined ionicstrength and pH) at which 50% of a complementary target sequencehybridizes to a perfectly matched probe. T_(m) is reduced by about 1° C.for each 1% of mismatching; thus, T_(m), hybridization, and/or washconditions can be adjusted to hybridize to sequences of the desiredidentity. For example, if sequences with >90% identity are sought, theT_(m) can be decreased 10° C. Generally, stringent conditions areselected to be about 5° C. lower than the thermal melting point (T_(m))for the specific sequence and its complement at a defined ionic strengthand pH. However, severely stringent conditions can utilize ahybridization and/or wash at 1, 2, 3, or 4° C. lower than the thermalmelting point (T_(m)); moderately stringent conditions can utilize ahybridization and/or wash at 6, 7, 8, 9, or 10° C. lower than thethermal melting point (T_(m)); low stringency conditions can utilize ahybridization and/or wash at 11, 12, 13, 14, 15, or 20° C. lower thanthe thermal melting point (T_(m)). Using the equation, hybridization andwash compositions, and desired T_(m), those of ordinary skill willunderstand that variations in the stringency of hybridization and/orwash solutions are inherently described. If the desired degree ofmismatching results in a T_(m) of less than 45° C. (aqueous solution) or32° C. (formamide solution), it is preferred to increase the SSCconcentration so that a higher temperature can be used. An extensiveguide to the hybridization of nucleic acids is found in Tijssen (1993)Laboratory Techniques in Biochemistry and MolecularBiology—Hybridization with Nucleic Acid Probes, Part I, Chapter 2(Elsevier, New York); and Ausubel et al., eds. (1995) Current Protocolsin Molecular Biology, Chapter 2 (Greene Publishing andWiley-Interscience, New York). See Sambrook et al. (1989) MolecularCloning: A Laboratory Manual (2d ed., Cold Spring Harbor LaboratoryPress, Plainview, N.Y.).

Substantially similar nucleic acid fragments of the instant inventionmay also be characterized by the percent identity of their encoded aminoacid sequences to the amino acid sequences disclosed herein, asdetermined by algorithms commonly employed by those skilled in this art.Suitable nucleic acid fragments (isolated polynucleotides of the presentinvention) encode polypeptides that are at least about 70% identical,preferably at least about 80% identical to the amino acid sequencesreported herein. Preferred nucleic acid fragments encode amino acidsequences that are about 85% identical to the amino acid sequencesreported herein. More preferred nucleic acid fragments encode amino acidsequences that are at least about 90% identical to the amino acidsequences reported herein. Most preferred are nucleic acid fragmentsthat encode amino acid sequences that are at least about 95% identicalto the amino acid sequences reported herein. Suitable nucleic acidfragments not only have the above identities but typically encode apolypeptide having at least 50 amino acids, preferably at least 100amino acids, more preferably at least 150 amino acids, still morepreferably at least 200 amino acids, and most preferably at least 250amino acids.

The following terms are used to describe the sequence relationshipsbetween a polynucleotide/polypeptide of the present invention and areference polynucleotide/polypeptide: (a) “reference sequence”, (b)“comparison window”, (c) “sequence identity”, and (d) “percentage ofsequence identity”.

(a) As used herein, “reference sequence” is a defined sequence used as abasis for sequence comparison with a polynucleotide/polypeptide of thepresent invention. A reference sequence may be a subset or the entiretyof a specified sequence; for example, as a segment of a full-length cDNAor gene sequence, or the complete cDNA or gene sequence.

(b) As used herein, “comparison window” includes reference to acontiguous and specified segment of a polynucleotide/polypeptidesequence, wherein the polynucleotide/polypeptide sequence may becompared to a reference sequence and wherein the portion of thepolynucleotide/polypeptide sequence in the comparison window maycomprise additions or deletions (i.e., gaps) compared to the referencesequence (which does not comprise additions or deletions) for optimalalignment of the two sequences. Generally, the comparison window is atleast 20 contiguous nucleotides/amino acid residues in length, andoptionally can be 30, 40, 50, 100, or longer. Those of skill in the artunderstand that to avoid a high similarity to a reference sequence dueto inclusion of gaps in the polynucleotide/polypeptide sequence, a gappenalty is typically introduced and is subtracted from the number ofmatches.

Methods of alignment of sequences for comparison are well-known in theart. Optimal alignment of sequences for comparison may be conducted bythe local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981); by the homology alignment algorithm of Needleman and Wunsch,J. Mol. Biol. 48: 443 (1970); by the search for similarity method ofPearson and Lipman, Proc. Natl. Acad. Sci. 85: 2444 (1988); bycomputerized implementations of these algorithms, including, but notlimited to: CLUSTAL in the PC/Gene program by Intelligenetics, MountainView, Calif.; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the GeneticsComputer Group (GCG®) package, Accelrys, Inc., San Diego, Calif.; theCLUSTAL program is well described by Higgins and Sharp, Gene 73: 237-244(1988); Higgins and Sharp, CABIOS 5: 151-153 (1989); Corpet, et al.,Nucleic Acids Research 16: 10881-90 (1988); Huang, et al., ComputerApplications in the Biosciences 8: 155-65 (1992), and Pearson, et al.,Methods in Molecular Biology 24: 307-331 (1994).

The BLAST family of programs which can be used for database similaritysearches includes: BLASTN for nucleotide query sequences againstnucleotide database sequences; BLASTX for nucleotide query sequencesagainst protein database sequences; BLASTP for protein query sequencesagainst protein database sequences; TBLASTN for protein query sequencesagainst nucleotide database sequences; and TBLASTX for nucleotide querysequences against nucleotide database sequences. See, Current Protocolsin Molecular Biology, Chapter 19, Ausubel, et al., Eds., GreenePublishing and Wiley-Interscience, New York (1995); Altschul et al., J.Mol. Biol., 215:403-410 (1990); and, Altschul et al., Nucleic Acids Res.25:3389-3402 (1997).

Software for performing BLAST analyses is publicly available, e.g.,through the National Center for Biotechnology Information(http://www.ncbi.nim.nih.gov/BLAST). This algorithm involves firstidentifying high scoring sequence pairs (HSPs) by identifying shortwords of length W in the query sequence, which either match or satisfysome positive-valued threshold score T when aligned with a word of thesame length in a database sequence. T is referred to as the neighborhoodword score threshold. These initial neighborhood word hits act as seedsfor initiating searches to find longer HSPs containing them. The wordhits are then extended in both directions along each sequence for as faras the cumulative alignment score can be increased. Cumulative scoresare calculated using, for nucleotide sequences, the parameters M (rewardscore for a pair of matching residues; always >0) and N (penalty scorefor mismatching residues; always <0). For amino acid sequences, ascoring matrix is used to calculate the cumulative score. Extension ofthe word hits in each direction are halted when: the cumulativealignment score falls off by the quantity X from its maximum achievedvalue; the cumulative score goes to zero or below, due to theaccumulation of one or more negative-scoring residue alignments; or theend of either sequence is reached. The BLAST algorithm parameters W, T,and X determine the sensitivity and speed of the alignment. The BLASTNprogram (for nucleotide sequences) uses as defaults a wordlength (W) of11, an expectation (E) of 10, a cutoff of 100, M=5, N=4, and acomparison of both strands. For amino acid sequences, the BLASTP programuses as defaults a wordlength (W) of 3, an expectation (E) of 10, andthe BLOSUM62 scoring matrix (see Henikoff & Henikoff (1989) Proc. Natl.Acad. Sci. USA 89:10915).

In addition to calculating percent sequence identity, the BLASTalgorithm also performs a statistical analysis of the similarity betweentwo sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA90:5873-5877 (1993)). One measure of similarity provided by the BLASTalgorithm is the smallest sum probability (P(N)), which provides anindication of the probability by which a match between two nucleotide oramino acid sequences would occur by chance.

BLAST searches assume that proteins can be modeled as random sequences.However, many real proteins comprise regions of nonrandom sequenceswhich may be homopolymeric tracts, short-period repeats, or regionsenriched in one or more amino acids. Such low-complexity regions may bealigned between unrelated proteins even though other regions of theprotein are entirely dissimilar. A number of low-complexity filterprograms can be employed to reduce such low-complexity alignments. Forexample, the SEG (Wooten and Federhen, Comput. Chem., 17:149-163 (1993))and XNU (Clayerie and States, Comput. Chem., 17:191-201 (1993))low-complexity filters can be employed alone or in combination.

Unless otherwise stated, nucleotide and protein identity/similarityvalues provided herein are calculated using GAP (GCG Version 10) underdefault values.

GAP (Global Alignment Program) can also be used to compare apolynucleotide or polypeptide of the present invention with a referencesequence. GAP uses the algorithm of Needleman and Wunsch (J. Mol. Biol.48: 443-453, 1970) to find the alignment of two complete sequences thatmaximizes the number of matches and minimizes the number of gaps. GAPconsiders all possible alignments and gap positions and creates thealignment with the largest number of matched bases and the fewest gaps.It allows for the provision of a gap creation penalty and a gapextension penalty in units of matched bases. GAP must make a profit ofgap creation penalty number of matches for each gap it inserts. If a gapextension penalty greater than zero is chosen, GAP must, in addition,make a profit for each gap inserted of the length of the gap times thegap extension penalty. Default gap creation penalty values and gapextension penalty values in Version 10 of the Wisconsin GeneticsSoftware Package for protein sequences are 8 and 2, respectively. Fornucleotide sequences the default gap creation penalty is 50 while thedefault gap extension penalty is 3. The gap creation and gap extensionpenalties can be expressed as an integer selected from the group ofintegers consisting of from 0 to 100. Thus, for example, the gapcreation and gap extension penalties can each independently be: 0, 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 30, 40, 50, 60 or greater.

GAP presents one member of the family of best alignments. There may bemany members of this family, but no other member has a better quality.GAP displays four figures of merit for alignments: Quality, Ratio,Identity, and Similarity. The Quality is the metric maximized in orderto align the sequences. Ratio is the quality divided by the number ofbases in the shorter segment. Percent Identity is the percent of thesymbols that actually match. Percent Similarity is the percent of thesymbols that are similar. Symbols that are across from gaps are ignored.A similarity is scored when the scoring matrix value for a pair ofsymbols is greater than or equal to 0.50, the similarity threshold. Thescoring matrix used in Version 10 of the Wisconsin Genetics SoftwarePackage is BLOSUM62 (see Henikoff & Henikoff (1989) Proc. Natl. Acad.Sci. USA 89:10915).

Multiple alignment of the sequences can be performed using the CLUSTALmethod of alignment (Higgins and Sharp (1989) CABIOS. 5:151-153) withthe default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10). Defaultparameters for pairwise alignments using the CLUSTAL method are KTUPLE1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5.

(c) As used herein, “sequence identity” or “identity” in the context oftwo nucleic acid or polypeptide sequences includes reference to theresidues in the two sequences which are the same when aligned formaximum correspondence over a specified comparison window. Whenpercentage of sequence identity is used in reference to proteins it isrecognized that residue positions which are not identical often differby conservative amino acid substitutions, where amino acid residues aresubstituted for other amino acid residues with similar chemicalproperties (e.g. charge or hydrophobicity) and therefore do not changethe functional properties of the molecule. Where sequences differ inconservative substitutions, the percent sequence identity may beadjusted upwards to correct for the conservative nature of thesubstitution. Sequences which differ by such conservative substitutionsare said to have “sequence similarity” or “similarity”. Means for makingthis adjustment are well-known to those of skill in the art. Typicallythis involves scoring a conservative substitution as a partial ratherthan a full mismatch, thereby increasing the percentage sequenceidentity. Thus, for example, where an identical amino acid is given ascore of 1 and a non-conservative substitution is given a score of zero,a conservative substitution is given a score between zero and 1. Thescoring of conservative substitutions is calculated, e.g., according tothe algorithm of Meyers and Miller, Computer Applic. Biol. Sci., 4:11-17 (1988) e.g., as implemented in the program PC/GENE(Intelligenetics, Mountain View, Calif., USA).

(d) As used herein, “percentage of sequence identity” means the valuedetermined by comparing two optimally aligned sequences over acomparison window, wherein the portion of the polynucleotide sequence inthe comparison window may comprise additions or deletions (i.e., gaps)as compared to the reference sequence (which does not comprise additionsor deletions) for optimal alignment of the two sequences. The percentageis calculated by determining the number of positions at which theidentical nucleic acid base or amino acid residue occurs in bothsequences to yield the number of matched positions, dividing the numberof matched positions by the total number of positions in the window ofcomparison and multiplying the result by 100 to yield the percentage ofsequence identity.

A “substantial portion” of an amino acid or nucleotide sequencecomprises an amino acid or a nucleotide sequence that is sufficient toafford putative identification of the protein or gene that the aminoacid or nucleotide sequence comprises. Amino acid and nucleotidesequences can be evaluated either manually by one skilled in the art, orby using computer-based sequence comparison and identification toolsthat employ algorithms such are described above. In general, a sequenceof ten or more contiguous amino acids, or thirty or more contiguousnucleotides, is necessary in order to putatively identify a polypeptideor nucleic acid sequence as homologous to a known protein or gene.Moreover, with respect to nucleotide sequences, gene-specificoligonucleotide probes comprising 30 or more contiguous nucleotides maybe used in sequence-dependent methods of gene identification (e.g.,Southern hybridization) and isolation (e.g., in situ hybridization ofbacterial colonies or bacteriophage plaques). In addition, shortoligonucleotides of 12 or more nucleotides may be used as amplificationprimers in PCR in order to obtain a particular nucleic acid fragmentcomprising the primers. Accordingly, a “substantial portion” of anucleotide sequence comprises a nucleotide sequence that will affordspecific identification and/or isolation of a nucleic acid fragmentcomprising the sequence. The instant specification teaches amino acidand nucleotide sequences encoding polypeptides that comprise one or moreparticular plant proteins. The skilled artisan, having the benefit ofthe sequences as reported herein, may now use all or a substantialportion of the disclosed sequences for purposes known to those skilledin this art. Accordingly, the instant invention comprises the completesequences as reported in the accompanying Sequence Listing, as well assubstantial portions of those sequences as defined above.

“Codon degeneracy” refers to divergence in the genetic code permittingvariation of the nucleotide sequence without affecting the amino acidsequence of an encoded polypeptide. Accordingly, the instant inventionrelates to any nucleic acid fragment comprising a nucleotide sequencethat encodes all or a substantial portion of the amino acid sequencesset forth herein.

“Synthetic nucleic acid fragments” can be assembled from oligonucleotidebuilding blocks that are chemically synthesized using procedures knownto those skilled in the art. These building blocks are ligated andannealed to form larger nucleic acid fragments which may then beenzymatically assembled to construct the entire desired nucleic acidfragment. “Chemically synthesized”, as related to a nucleic acidfragment, means that the component nucleotides were assembled in vitro.Manual chemical synthesis of nucleic acid fragments may be accomplishedusing well established procedures, or automated chemical synthesis canbe performed using one of a number of commercially available machines.Accordingly, the nucleic acid fragments can be tailored for optimal geneexpression based on optimization of the nucleotide sequence to reflectthe codon bias of the host cell. The skilled artisan appreciates thelikelihood of successful gene expression if codon usage is biasedtowards those codons favored by the host. Determination of preferredcodons can be based on a survey of genes derived from the host cellwhere sequence information is available.

“Gene” refers to a nucleic acid fragment which directs expression of aspecific protein, including regulatory sequences preceding (5′non-coding sequences) and following (3′ non-coding sequences) the codingsequence. “Native gene” refers to a gene as found in nature with its ownregulatory sequences. “Chimeric gene” refers to any gene that is not anative gene, comprising regulatory and coding sequences that are notfound together in nature. Accordingly, a chimeric gene may compriseregulatory sequences and coding sequences that are derived fromdifferent sources, or regulatory sequences and coding sequences derivedfrom the same source, but arranged in a manner different than that foundin nature. “Endogenous gene” refers to a native gene in its naturallocation in the genome of an organism. A “foreign gene” refers to a genenot normally found in the host organism, but that is introduced into thehost organism by gene transfer. Foreign genes can comprise native genesinserted into a non-native organism, or chimeric genes. A “transgene” isa gene that has been introduced into the genome by a transformationprocedure.

“Coding sequence” refers to a nucleotide sequence that codes for aspecific amino acid sequence. “Regulatory sequences” refer to nucleotidesequences located upstream (5′ non-coding sequences), within, ordownstream (3′ non-coding sequences) of a coding sequence, and whichinfluence the transcription, RNA processing or stability, or translationof the associated coding sequence. Regulatory sequences may includepromoters, translation leader sequences, introns, binding sites forregulatory proteins, and polyadenylation recognition sequences.

“Promoter” refers to a nucleotide sequence capable of controlling theexpression of a coding sequence or functional RNA. In general, a codingsequence is located 3′ to a promoter sequence. The promoter sequenceconsists of proximal and more distal upstream elements, the latterelements often referred to as enhancers. Accordingly, an “enhancer” is anucleotide sequence which can stimulate promoter activity and may be aninnate element of the promoter or a heterologous element inserted toenhance the level or tissue-specificity of a promoter. Enhancer elementsfor plants are known in the art and include, for example, the SV40enhancer region, the ³⁵S enhancer element, and the like.

Promoters may be derived in their entirety from a native gene, or may becomposed of different elements derived from different promoters found innature, or may even comprise synthetic nucleotide segments. It isunderstood by those skilled in the art that different promoters maydirect the expression of a gene in different tissues or cell types, orat different stages of development, or in response to differentenvironmental conditions. Promoters which cause a nucleic acid fragmentto be expressed in most cell types at most times are commonly referredto as “constitutive promoters”. New promoters of various types useful inplant cells are constantly being discovered; numerous examples may befound in the compilation by Okamuro and Goldberg (1989) Biochemistry ofPlants 15:1-82. Constitutive promoters include, for example, the corepromoter of the Rsyn7 (U.S. Pat. No. 6,072,050); the core CaMV 35Spromoter (Odell et al. (1985) Nature 313:810-812); rice actin (McElroyet al. (1990) Plant Cell 2:163-171); ubiquitin (Christensen et al.(1989) Plant Mol. Biol. 12:619-632 and Christensen et al. (1992) PlantMol. Biol. 18:675-689); PEMU (Last et al. (1991) Theor. Appl. Genet81:581-588); MAS (Velten et al. (1984) EMBO J. 3:2723-2730); ALSpromoter (U.S. Pat. No. 5,659,026), and the like. Other constitutivepromoters include, for example, U.S. Pat. Nos. 5,608,149; 5,608,144;5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; and 5,608,142.

It is further recognized that since in most cases the exact boundariesof regulatory sequences have not been completely defined, nucleic acidfragments of different lengths may have identical promoter activity.

By “tissue-preferred” is intended that the expression driven by a plantpromoter is selectively enhanced or suppressed in particular plant cellsor tissues, in comparison to other cells or tissues.

By “promoter” or “transcriptional initiation region” is intended aregulatory region of DNA usually comprising a TATA box capable ofdirecting RNA polymerase II to initiate RNA synthesis at the appropriatetranscription initiation site for a particular coding sequence. Apromoter may additionally comprise other recognition sequences generallypositioned upstream or 5′ to the TATA box, and referred to as “promoterelements” which influence the expression driven by the core promoter.Promoter elements located upstream or 5′ to the TATA box are alsoreferred to as upstream promoter elements. In particular embodiments ofthe invention, the promoter elements of the invention are positionedupstream or 5′ to the TATA box. However, the invention also encompassesplant promoter configurations in which the promoter elements arepositioned downstream or 3′ to the TATA box.

By “transcription regulatory unit” is intended a promoter comprising oneor more promoter elements.

By “core promoter” is intended a promoter not comprising promoterelements other than the TATA box and the transcriptional start site.

In reference to a promoter, by “native” is intended a promoter capableof driving expression in a cell of interest, wherein the nucleotidesequence of the promoter is found in that cell in nature.

In reference to a promoter or transcription initiation region, by“synthetic” is intended a promoter capable of driving expression in acell of interest, wherein the nucleotide sequence of the promoter is notfound in nature. A synthetic promoter cannot be isolated from any cellunless it is first introduced to the cell or to an ancestor thereof.

By “suppressors” are intended nucleotide sequences that mediatesuppression or decrease in the expression directed by a promoter region.That is, suppressors are the DNA sites through which transcriptionrepressor proteins exert their effects. Suppressors can mediatesuppression of expression by overlapping transcription start sites ortranscription activator sites, or they can mediate suppression fromdistinct locations with respect to these sites.

Modifications of the promoter sequences of the present invention canprovide for a range of expression. Generally, by “weak promoter” isintended a promoter that drives expression of a coding sequence at a lowlevel. By “low level” is intended at levels of about 1/10,000transcripts to about 1/100,000 transcripts to about 1/500,000transcripts. Conversely, a strong promoter drives expression of a codingsequence at a high level, or at about 1/10 transcripts to about 1/100transcripts to about 1/1,000 transcripts.

The nucleotide sequences for the plant promoters of the presentinvention comprise the sequences set forth in SEQ ID NOS: 73 and 74 orany sequence having substantial identity to the sequences. By“substantial identity” is intended a sequence exhibiting substantialfunctional and structural equivalence with the sequence set forth. Anyfunctional or structural differences between substantially identicalsequences do not affect the ability of the sequence to function as apromoter as disclosed in the present invention.

Promoters comprising biologically active fragments of SEQ ID NOS: 73 and74 of the invention are also encompassed by the present invention. By“fragment” is intended a portion of the promoter nucleotide sequencethat is shorter than the full-length promoter sequence and which mayretain biological activity. Alternatively, fragments of a nucleotidesequence that are useful as hybridization probes or PCR primersgenerally do not retain biological activity. Thus, fragments of anucleotide sequence may range from at least about 15, 20, or 25nucleotides, and up to but not including the full length of a nucleotidesequence of the invention.

The invention encompasses variants of the plant promoters. By “variants”is intended substantially identical sequences. Naturally-occurringvariants of the promoter sequences can be identified and/or isolatedwith the use of well-known molecular biology techniques, as, forexample, with PCR and hybridization techniques as outlined below.

Variant promoter nucleotide sequences include synthetically derivednucleotide sequences, such as those generated, for example, by usingsite-directed mutagenesis or automated oligonucleotide synthesis, butwhich still exhibit promoter activity. Methods for mutagenesis andnucleotide sequence alterations are well known in the art. See, forexample, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel etal. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192;Walker and Gaastra, eds. (1983) Techniques in Molecular Biology(MacMillan Publishing Company, New York) and the references citedtherein. Generally, a nucleotide sequence of the invention will have atleast 80%, preferably 85%, 90%, 95%, up to 98% or more sequence identityto its respective reference promoter nucleotide sequence, and enhance orpromote expression of heterologous coding sequences in plants or plantcells.

Biologically active variants of the promoter element sequences shouldretain promoter regulatory activity, and thus enhance or suppressexpression of a nucleotide sequence operably linked to a transcriptionregulatory unit comprising the promoter element. Promoter activity maybe measured by Northern blot analysis. See, for example, Sambrook et al.(1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold SpringHarbor Laboratory Press, Plainview, N.Y.); herein incorporated byreference. Protein expression indicative of promoter activity can bemeasured by determining the activity of a protein encoded by the codingsequence operably linked to the particular promoter; including but notlimited to such examples as GUS (b-glucoronidase; Jefferson (1987) PlantMol. Biol. Rep. 5:387), GFP (green florescence protein; Chalfie et al.(1994) Science 263:802), luciferase (Riggs et al. (1987) Nucleic AcidsRes. 15(19):8115 and Luehrsen et al. (1992) Methods Enzymol.216:397-414), and the maize genes encoding for anthocyanin production(Ludwig et al. (1990) Science 247:449).

The invention also encompasses nucleotide sequences which hybridize tothe promoter element sequences of the invention under stringentconditions, and enhance or suppress expression of a nucleotide sequenceoperably linked to a transcription regulatory unit comprising thepromoter sequences. Hybridization methods are known in the art. See, forexample Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual(2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.). See alsoInnis et al., eds. (1990) PCT Protocols: A Guide to Methods andApplications (Academic Press, New York); Innis and Gelfand, eds. (1995)PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds.(1999) PCR Methods Manual (Academic Press, New York).

An “isolated” or “purified” nucleic acid molecule, or biologicallyactive portion thereof, is substantially free of other cellularmaterial, or culture medium when produced by recombinant techniques, orsubstantially free of chemical precursors or other chemicals whenchemically synthesized.

“Translation leader sequence” refers to a nucleotide sequence locatedbetween the promoter sequence of a gene and the coding sequence. Thetranslation leader sequence is present in the fully processed mRNAupstream of the translation start sequence. The translation leadersequence may affect processing of the primary transcript to mRNA, mRNAstability or translation efficiency. Examples of translation leadersequences have been described (Turner and Foster (1995) Mol. Biotechnol.3:225-236).

The term “3′ non-coding sequences” refers to nucleotide sequenceslocated downstream of a coding sequence and includes polyadenylationrecognition sequences and other sequences encoding regulatory signalscapable of affecting mRNA processing or gene expression. Thepolyadenylation signal is usually characterized by the addition ofpolyadenylic acid tracts to the 3′ end of the mRNA precursor. The use ofdifferent 3′ non-coding sequences is exemplified by Ingelbrecht et al.(1989) Plant Cell 1:671-680.

“RNA transcript” refers to the product resulting from RNApolymerase-catalyzed transcription of a DNA sequence. When the RNAtranscript is a perfect complementary copy of the DNA sequence, it isreferred to as the primary transcript. An RNA sequence derived frompost-transcriptional processing of the primary transcript is referred toas the mature RNA. “Messenger RNA (mRNA)” refers to the RNA that iswithout introns and that can be translated into polypeptides by thecell. “cDNA” refers to DNA that is complementary to and derived from anmRNA template. The cDNA can be single-stranded or converted todouble-stranded form using, for example, the Klenow fragment of DNApolymerase 1. “Sense-RNA” refers to an RNA transcript that includes themRNA and so can be translated into a polypeptide by the cell. “AntisenseRNA” refers to an RNA transcript that is complementary to all or part ofa target primary transcript or mRNA and that blocks the expression of atarget gene (see U.S. Pat. No. 5,107,065, incorporated herein byreference). The complementarity of an antisense RNA may be with any partof the specific nucleotide sequence, i.e., at the 5′ non-codingsequence, 3′ non-coding sequence, introns, or the coding sequence.“Functional RNA” refers to sense RNA, antisense RNA, ribozyme RNA, orother RNA that may not be translated but yet has an effect on cellularprocesses.

The term “operably linked” refers to the association of two or morenucleic acid fragments on a single polynucleotide so that the functionof one is affected by the other. For example, a promoter is operablylinked with a coding sequence when it is capable of affecting theexpression of that coding sequence (i.e., that the coding sequence isunder the transcriptional control of the promoter). Coding sequences canbe operably linked to regulatory sequences in sense or antisenseorientation.

The term “expression”, as used herein, refers to the transcription andstable accumulation of sense (mRNA) or antisense RNA derived from thenucleic acid fragment of the invention. Expression may also refer totranslation of mRNA into a polypeptide. “Antisense inhibition” refers tothe production of antisense RNA transcripts capable of suppressing theexpression of the target protein. “Overexpression” refers to theproduction of a gene product in transgenic organisms that exceeds levelsof production in normal or non-transformed organisms. “Co-suppression”refers to the production of sense RNA transcripts capable of suppressingthe expression of identical or substantially similar foreign orendogenous genes (U.S. Pat. No. 5,231,020, incorporated herein byreference).

A “protein” or “polypeptide” is a chain of amino acids arranged in aspecific order determined by the coding sequence in a polynucleotideencoding the polypeptide. Each protein or polypeptide has a uniquefunction.

“Altered levels” or “altered expression” refers to the production ofgene product(s) in transgenic organisms in amounts or proportions thatdiffer from that of normal or non-transformed organisms.

“Null mutant” refers here to a host cell which either lacks theexpression of a certain polypeptide or expresses a polypeptide which isinactive or does not have any detectable expected enzymatic function.

“Mature protein” or the term “mature” when used in describing a proteinrefers to a post-translationally processed polypeptide; i.e., one fromwhich any pre- or propeptides present in the primary translation producthave been removed. “Precursor protein” or the term “precursor” when usedin describing a protein refers to the primary product of translation ofmRNA; i.e., with pre- and propeptides still present. Pre- andpropeptides may be, but are not limited to, intracellular localizationsignals.

A “chloroplast transit peptide” is an amino acid sequence which istranslated in conjunction with a protein and directs the protein to thechloroplast or other plastid types present in the cell in which theprotein is made. “Chloroplast transit sequence” refers to a nucleotidesequence that encodes a chloroplast transit peptide. A “signal peptide”is an amino acid sequence which is translated in conjunction with aprotein and directs the protein to the secretory system (Chrispeels(1991) Ann. Rev. Plant Phys. Plant Mol. Biol. 42:21-53). If the proteinis to be directed to a vacuole, a vacuolar targeting signal (supra) canfurther be added, or if to the endoplasmic reticulum, an endoplasmicreticulum retention signal (supra) may be added. If the protein is to bedirected to the nucleus, any signal peptide present should be removedand instead a nuclear localization signal included (Raikhel (1992) PlantPhys. 100:1627-1632).

“Transformation” refers to the transfer of a nucleic acid fragment intothe genome of a host organism, resulting in genetically stableinheritance. Host organisms containing the transformed nucleic acidfragments are referred to as “transgenic” organisms. Examples of methodsof plant transformation include Agrobacterium-mediated transformation(De Blaere et al. (1987) Meth. Enzymol. 143:277) andparticle-accelerated or “gene gun” transformation technology (Klein etal. (1987) Nature (London) 327:70-73; U.S. Pat. No. 4,945,050,incorporated herein by reference). Thus, isolated polynucleotides of thepresent invention can be incorporated into recombinant constructs,typically DNA constructs, capable of introduction into and replicationin a host cell. Such a construct can be a vector that includes areplication system and sequences that are capable of transcription andtranslation of a polypeptide-encoding sequence in a given host cell. Anumber of vectors suitable for stable transfection of plant cells or forthe establishment of transgenic plants have been described in, e.g.,Pouwels et al., Cloning Vectors: A Laboratory Manual, 1985, supp. 1987;Weissbach and Weissbach, Methods for Plant Molecular Biology, AcademicPress, 1989; and Flevin et al., Plant Molecular Biology Manual, KluwerAcademic Publishers, 1990. Typically, plant expression vectors include,for example, one or more cloned plant genes under the transcriptionalcontrol of 5′ and 3′ regulatory sequences and a dominant selectablemarker. Such plant expression vectors also can contain a promoterregulatory region (e.g., a regulatory region controlling inducible orconstitutive, environmentally- or developmentally-regulated, or cell- ortissue-specific expression), a transcription initiation start site, aribosome binding site, an RNA processing signal, a transcriptiontermination site, and/or a polyadenylation signal.

Standard recombinant DNA and molecular cloning techniques used hereinare well known in the art and are described more fully in Sambrook etal. Molecular Cloning: A Laboratory Manual; Cold Spring HarborLaboratory Press: Cold Spring Harbor, 1989 (hereinafter “Maniatis”).

“PCR” or “polymerase chain reaction” is well known by those skilled inthe art as a technique used for the amplification of specific DNAsegments (U.S. Pat. Nos. 4,683,195 and 4,800,159).

As used herein, the term “plant” includes reference to whole plants andtheir progeny; plant cells; plant parts or organs, such as embryos,pollen, ovules, seeds, flowers, kernels, ears, cobs, leaves, husks,stalks, stems, roots, root tips, anthers, silk and the like. Plant cell,as used herein, further includes, without limitation, cells obtainedfrom or found in: seeds, suspension cultures, embryos, meristematicregions, callus tissue, leaves, roots, shoots, gametophytes,sporophytes, pollen, and microspores. Plant cells can also be understoodto include modified cells, such as protoplasts, obtained from theaforementioned tissues. The class of plants which can be used in themethods of the invention is generally as broad as the class of higherplants amenable to transformation techniques, including bothmonocotyledonous and dicotyledonous plants. A particularly preferredplant is Zea mays.

The nucleotide sequences for the promoters of the invention are providedin expression cassettes along with nucleotide sequences of interest forexpression in the plant of interest. Such nucleotide constructs orexpression cassettes will comprise a transcriptional initiation regionin combination with a promoter element operably linked to the nucleotidesequence whose expression is to be controlled by the promoters disclosedherein. Such construct is provided with a plurality of restriction sitesfor insertion of the nucleotide sequence to be under the transcriptionalregulation of the regulatory regions. The expression cassette mayadditionally contain selectable marker genes.

The transcriptional cassette will include in the 5′-to-3′ direction oftranscription, a transcriptional and translational initiation region,one or more promoter elements, a nucleotide sequence of interest, and atranscriptional and translational termination region functional in plantcells. The termination region may be native with the transcriptionalinitiation region comprising one or more of the promoter nucleotidesequences of the present invention, may be native with the DNA sequenceof interest, or may be derived from another source. Convenienttermination regions are available from the Ti-plasmid of A. tumefaciens,such as the octopine synthase and nopaline synthase termination regions.See also, Guerineau et al. (1991) Mol. Gen. Genet. 262:141-144;Proudfoot (1991) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev.5:141-149; Mogen et al., (1990) Plant Cell 2:1261-1272; Munroe et al.(1990) Gene 91:151-158; Ballas et al. 1989) Nucleic Acids Res.17:7891-7903; Joshi et al. (1987) Nucleic Acid Res. 15:9627-9639.

The expression cassette comprising the transcription regulatory unit ofthe invention operably linked to a nucleotide sequence may also containat least one additional nucleotide sequence for a gene to becotransformed into the organism. Alternatively, the additionalsequence(s) can be provided on another expression cassette.

Where appropriate, the nucleotide sequence whose expression is to beunder the control of the promoter sequence of the present invention, andany additional nucleotide sequence(s), may be optimized for increasedexpression in the transformed plant. That is, these nucleotide sequencescan be synthesized using plant-preferred codons for improved expression.Methods are available in the art for synthesizing plant-preferrednucleotide sequences. See, for example, U.S. Pat. Nos. 5,380,831 and5,436,391, and Murray et al., (1989) Nucleic Acids Res. 17:477-498,herein incorporated by reference.

Additional sequence modifications are known to enhance gene expressionin a cellular host. These include elimination of sequences encodingspurious polyadenylation signals, exon-intron splice site signals,transposon-like repeats, and other such well-characterized sequencesthat may be deleterious to gene expression. The G-C content of thenucleotide sequence of interest may be adjusted to levels average for agiven cellular host, as calculated by reference to known genes expressedin the host cell. When possible, the sequence is modified to avoidpredicted hairpin secondary mRNA structures.

The expression cassettes may additionally contain 5′ leader sequences inthe expression cassette construct. Such leader sequences can act toenhance translation. Translation leaders are known in the art andinclude: picornavirus leaders, for example, EMCV leader(Encephalomyocarditis 5′ noncoding region) (Elroy-Stein et al. (1989)Proc. Nat. Acad. Sci. USA 86:6126-6130); potyvirus leaders, for example,TEV leader (Tobacco Etch Virus) (Allison et al. (1986)); MDMV leader(Maize Dwarf Mosaic Virus) (Virology 154:9-20); human immunoglobulinheavy-chain binding protein (BiP) (Macejak and Sarnow (1991) Nature353:90-94); untranslated leader from the coat protein mRNA of alfalfamosaic virus (AMV RNA 4) (Jobling and Gehrke (1987) Nature 325:622-625);tobacco mosaic virus leader (TMV) (Gallie et al. (1989) MolecularBiology of RNA, pages 237-256); and maize chlorotic mottle virus leader(MCMV) (Lommel et al. (1991) Virology 81:382-385). See also Della-Cioppaet al. (1987) Plant Physiology 84:965-968. Other methods known toenhance translation and/or mRNA stability can also be utilized, forexample, introns, and the like.

In preparing the expression cassette, the various DNA fragments may bemanipulated, so as to provide for the DNA sequences in the properorientation and, as appropriate, in the proper reading frame. Towardthis end, adapters or linkers may be employed to join the DNA fragmentsor other manipulations may be involved to provide for convenientrestriction sites, removal of superfluous DNA, removal of restrictionsites, or the like. For this purpose, in vitro mutagenesis, primerrepair, restriction, annealing, substitutions, for example, transitionsand transversions, may be involved.

The promoters may be used to drive reporter genes or selectable markergenes. Examples of suitable reporter genes known in the art can be foundin, for example, Jefferson et al. (1991) in Plant Molecular BiologyManual, ed. Gelvin et al. (Kluwer Academic Publishers), pp. 1-33; DeWetet al. (1987) Mol. Cell. Biol. 7:725-737; Goff et al. (1990) EMBO J.9:2517-2522; and Kain et al. (1995) BioTechniques 19:650-655; and Chiuet al. (1996) Current Biology 6:325-330.

Selectable marker genes for selection of transformed cells or tissuescan include genes that confer antibiotic resistance or resistance toherbicides. Examples of suitable selectable marker genes include, butare not limited to, genes encoding resistance to chloramphenicol(Herrera Estrella et al. (1983) EMBO J. 2:987-992); methotrexate(Herrera Estrella et al. (1983) Nature 303:209-213; Meijer et al. (1991)Plant Mol. Biol. 16:807-820); hygromycin (Waldron et al. (1985) PlantMol. Biol. 5:103-108; Zhijian et al. (1995) Plant Science 108:219-227);streptomycin (Jones et al. (1987) Mol. Gen. Genet. 210:86-91);spectinomycin (Bretagne-Sagnard et al. (1996) Transgenic Res.5:131-137); bleomycin (Hille et al. (1990) Plant Mol. Biol. 7:171-176);sulfonamide (Guerineau et al. (1990) Plant Mol. Biol. 15:127-136);bromoxynil (Stalker et al. (1988) Science 242:419423); glyphosate (Shawet al. (1986) Science 233:478481); phosphinothricin (DeBlock et al.(1987) EMBO J. 6:2513-2518).

Other genes that could serve utility in the recovery of transgenicevents but might not be required in the final product would include, butare not limited to, such examples as GUS (b-glucoronidase; Jefferson(1987) Plant Mol. Biol. Rep. 5:387), GFP (green fluorescence protein;Chalfie et al. (1994) Science 263:802), luciferase (Riggs et al. (1987)Nucleic Acids Res. 15(19):8115 and Luehrsen et al. (1992) MethodsEnzymol. 216:397-414), and the maize genes encoding for anthocyaninproduction (Ludwig et al. (1990) Science 247:449).

The expression cassette comprising the transcription regulatory unit ofthe present invention operably linked to a nucleotide sequence ofinterest can be used to transform any plant. In this manner, geneticallymodified plants, plant cells, plant tissue, seed, and the like can beobtained. Transformation protocols as well as protocols for introducingnucleotide sequences into plants may vary depending on the type of plantor plant cell, i.e., monocot or dicot, targeted for transformation.Suitable methods of introducing nucleotide sequences into plant cellsand subsequent insertion into the plant genome include microinjection(Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggset al. (1986) Proc. Natl. Acad. Sci. USA 83:5602-5606,Agrobacterium-mediated transformation (Townsend et al., U.S. Pat. No.5,563,055), direct gene transfer (Paszkowski et al. (1984) EMBO J.3:2717-2722), and ballistic particle acceleration (see, for example,Sanford et al., U.S. Pat. No. 4,945,050; Tomes et al., (1995) “DirectDNA Transfer into Intact Plant Cells via Microprojectile Bombardment,”in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed.Gamborg and Phillips (Springer-Verlag, Berlin); and McCabe et al. (1988)Biotechnology 6:923-926). Also see Weissinger et al. (1988) Ann. Rev.Genet. 22:421-477; Sanford et al. (1987) Particulate Science andTechnology 5:27-37 (onion); Christou et al. (1988) Plant Physiol.87:671-674 (soybean); McCabe et al. (1988) Bio/Technology 6:923-926(soybean); Finer and McMullen (1991) In Vitro Cell Dev. Biol.27P:175-182 (soybean); Singh et al. (1998) Theor. Appl. Genet.96:319-324 (soybean); Datta et al. (1990) Biotechnology 8:736-740(rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85:43054309(maize); Klein et al. (1988) Biotechnology 6:559-563 (maize); Tomes,U.S. Pat. No. 5,240,855; Buising et al., U.S. Pat. Nos. 5,322,783 and5,324,646; Tomes et al. (1995) “Direct DNA Transfer into Intact PlantCells via Microprojectile Bombardment,” in Plant Cell, Tissue, and OrganCulture: Fundamental Methods, ed. Gamborg (Springer-Verlag, Berlin)(maize); Klein et al. (1988) Plant Physiol. 91:440444 (maize); Fromm etal. (1990) Biotechnology 8:833-839 (maize); Hooykaas-Van Slogteren etal. (1984) Nature (London) 311:763-764; Bytebier et al. (1987) Proc.Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al. (1985) inThe Experimental Manipulation of Ovule Tissues, ed. Chapman et al.(Longman, New York), pp. 197-209 (pollen); Kaeppler et al. (1990) PlantCell Reports 9:415-418 and Kaeppler et al. (1992) Theor. Appl. Genet.84:560-566 (whisker-mediated transformation); D'Halluin et al. (1992)Plant Cell 4:1495-1505 (electroporation); Li et al. (1993) Plant CellReports 12:250-255 and Christou and Ford (1995) Annals of Botany75:407413 (rice); Osjoda et al. (1996) Nature Biotechnology 14:745-750(maize via Agrobacterium tumefaciens); all of which are hereinincorporated by reference.

In certain preferred embodiments in this regard, the vectors provide forpreferred expression. Such preferred expression may be inducibleexpression, or temporally limited, or restricted to predominantlycertain types of cells, or any combination of the above. Particularlypreferred among inducible vectors are vectors that can be induced forexpression by environmental factors that are easy to manipulate, such astemperature and nutrient additives. A variety of vectors suitable tothis aspect of the invention, including constitutive and inducibleexpression vectors for use in prokaryotic and eukaryotic hosts, are wellknown and employed routinely by those of skill in the art. Such vectorsinclude, among others, chromosomal, episomal and virus-derived vectors,e.g., vectors derived from bacterial plasmids, from bacteriophage, fromtransposons, from yeast episomes, from insertion elements, from yeastchromosomal elements, from viruses such as baculoviruses, papovaviruses, such as SV40, vaccinia viruses, adenoviruses, fowl pox viruses,pseudorabies viruses and retroviruses, and vectors derived fromcombinations thereof, such as those derived from plasmid andbacteriophage genetic elements, such as cosmids and phagemids andbinaries used for Agrobacterium-mediated transformations. All may beused for expression in accordance with this aspect of the presentinvention.

The cells that have been transformed may be grown into plants inaccordance with conventional ways. See, for example, McCormick et al.(1986) Plant Cell Reports 5:81-84. These plants may then be grown, andeither pollinated with the same transformed strain or different strains,and the resulting hybrid having expression of the desired phenotypiccharacteristic identified. Two or more generations may be grown toensure that expression of the desired phenotypic characteristic isstably maintained and inherited and then seeds harvested to ensureexpression of the desired phenotypic characteristic has been achieved.

The present invention may be used for transformation of any plantspecies, including, but not limited to, maize (Zea mays), Brassica sp.(e.g., B. napus, B. rapa, B. juncea), particularly those Brassicaspecies useful as sources of seed oil, alfalfa (Medicago sativa), rice(Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghumvulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet(Panicum miliaceum), foxtail millet (Setaria italica), finger millet(Eleusine coracana)), sunflower (Helianthus annuus), safflower(Carthamus tinctorius), wheat (Triticum aestivum), soybean (Glycinemax), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts(Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum),sweet potato (Ipomoea batatus), cassaya (Manihot esculenta), coffee(Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus),citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camelliasinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficuscasica), guava (Psidium guajava), mango (Mangifera indica), olive (Oleaeuropaea), papaya (Carica papaya), cashew (Anacardium occidentale),macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugarbeets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley,vegetables, ornamentals, and conifers.

Vegetables include tomatoes (Lycopersicon esculentum), lettuce (e.g.,Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseoluslimensis), peas (Lathyrus spp.), and members of the genus Cucumis suchas cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon(C. melo). Ornamentals include azalea (Rhododendron spp.), hydrangea(Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosaspp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias(Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia(Euphorbia pulcherrima), and chrysanthemum. Conifers that may beemployed in practicing the present invention include, for example, pinessuch as loblolly pine (Pinus taeda), slash pine (Pinus elliotii),ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), andMonterey pine (Pinus radiata); Douglas-fir (Pseudotsuga menziesii);Western hemlock (Tsuga canadensis); Sitka spruce (Picea glauca); redwood(Sequoia sempervirens); true firs such as silver fir (Abies amabilis)and balsam fir (Abies balsamea); and cedars such as Western red cedar(Thuja plicata) and Alaska yellow-cedar (Chamaecyparis nootkatensis).Preferably, plants of the present invention are crop plants (forexample, maize, alfalfa, sunflower, Brassica, soybean, cotton,safflower, peanut, sorghum, wheat, millet, tobacco, etc.), morepreferably maize and soybean plants, yet more preferably maize plants.

Plants of particular interest include grain plants that provide seeds ofinterest, oil-seed plants, and leguminous plants. Seeds of interestinclude grain seeds, such as maize, wheat, barley, rice, sorghum, rye,etc. Oil-seed plants include cotton, soybean, safflower, sunflower,Brassica, maize, alfalfa, palm, coconut, etc. Leguminous plants includebeans and peas. Beans include guar, locust bean, fenugreek, soybean,garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea,etc.

The promoter sequences and methods disclosed herein, comprising SEQ IDNO: 73 and 74, are useful in regulating expression of a nucleotidesequence of interest in a host plant in a spatial-, temporal-, and/ortissue-preferred manner. Thus, the nucleotide sequence operably linkedto the promoters disclosed herein may be a structural gene encoding aprotein of interest. Examples of such genes include, but are not limitedto, genes encoding proteins conferring resistance to abiotic stress,such as drought, temperature, salinity, and toxins such as pesticidesand herbicides, or to biotic stress, such as attacks by fungi, viruses,bacteria, insects, and nematodes, and development of diseases associatedwith these organisms. Other examples include genes encoding proteinswhich modify plant reproduction, such as those affecting male or femalesterility or fertility, or which preferentially express in maternal orpaternal tissue.

Alternatively, the nucleotide sequence operably linked to one of thepromoters disclosed herein may be an antisense sequence for a targetedgene. Thus, sequences can be constructed which are complementary to, andwill hybridize with, the messenger RNA (mRNA) of the targeted gene.Modifications of the antisense sequences may be made, as long as thesequences hybridize to and interfere with expression of thecorresponding mRNA. In this manner, antisense constructions having 70%,preferably 80%, more preferably 85% sequence similarity to thecorresponding antisensed sequences may be used. Furthermore, portions ofthe antisense nucleotides may be used to disrupt the expression of thetarget gene. Generally, sequences of at least 50 nucleotides, 100nucleotides, 200 nucleotides, or greater may be used. When deliveredinto a plant cell, expression of the antisense DNA sequence preventsnormal expression of the DNA nucleotide sequence for the targeted gene.In this manner, production of the native protein encoded by the targetedgene is inhibited to achieve a desired phenotypic response. Thus thepromoter is linked to antisense DNA sequences to reduce or inhibitexpression of a native protein in the plant.

The present invention concerns an isolated polynucleotide comprising anucleotide sequence encoding a functional FIE polypeptide having atleast 80% identity, based on the GAP (GCG Version 10) method ofalignment, to a polypeptide selected from the group consisting of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70.

The present invention also concerns an isolated polynucleotidecomprising a chromosomal nucleotide sequence having at least 80%identity, based on the GAP (GCG Version 10) method of alignment, to anucleotide of SEQ ID NO: 71 or 72.

Preferably, the isolated nucleotide sequence comprises a nucleic acidsequence selected from the group consisting of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43,45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, and 72 thatcodes for the polypeptide selected from the group consisting of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70.

Nucleic acid fragments encoding at least a portion of several proteinsinvolved in seed development have been isolated and identified bycomparison of random plant cDNA sequences to public databases containingnucleotide and protein sequences, using the BLAST algorithms well knownto those skilled in the art. The nucleic acid fragments of the instantinvention may be used to isolate cDNAs and genes encoding homologousproteins from the same or other plant species. Isolation of homologousgenes using sequence-dependent protocols is well known in the art.Examples of sequence-dependent protocols include, but are not limitedto, methods of nucleic acid hybridization, and methods of DNA and RNAamplification as exemplified by various uses of nucleic acidamplification technologies (e.g., polymerase chain reaction, ligasechain reaction).

For example, genes encoding other fertilization-independent endospermproteins, either as cDNAs or genomic DNAs, could be isolated directly byusing all or a portion of the instant nucleic acid fragments as DNAhybridization probes to screen libraries from any desired plant,employing methodology well known to those skilled in the art. Specificoligonucleotide probes based upon the instant nucleic acid sequences canbe designed and synthesized by methods known in the art (e.g., MolecularCloning: A Laboratory Manual, 2^(nd) Edition, Sambrook, Fritsch, andManiatis). Moreover, an entire sequence can be used directly tosynthesize DNA probes by methods known to the skilled artisan, such asrandom primer DNA labeling, nick translation, end-labeling techniques,or RNA probes using available in vitro transcription systems. Inaddition, specific primers can be designed and used to amplify a part orall of the instant sequences. The resulting amplification products canbe labeled directly during amplification reactions or labeled afteramplification reactions, and used as probes to isolate full length cDNAor genomic fragments under conditions of appropriate stringency.

In addition, two short segments of the instant nucleic acid fragmentsmay be used in polymerase chain reaction protocols to amplify longernucleic acid fragments encoding homologous genes from DNA or RNA. Thepolymerase chain reaction may also be performed on a library of clonednucleic acid fragments wherein the sequence of one primer is derivedfrom the instant nucleic acid fragments, and the sequence of the otherprimer takes advantage of the presence of the polyadenylic acid tractsto the 3′ end of the mRNA precursor encoding plant genes. Alternatively,the second primer sequence may be based upon sequences derived from thecloning vector. For example, the skilled artisan can follow the RACEprotocol (Frohman et al., (1988) Proc. Natl. Acad. Sci. USA85:8998-9002) to generate cDNAs by using PCR to amplify copies of theregion between a single point in the transcript and the 3′ or 5′ end.Primers oriented in the 3′ and 5′ directions can be designed from theinstant sequences. Using commercially available 3′ RACE or 5′ RACEsystems (BRL), specific 3′ or 5′ cDNA fragments can be isolated (Oharaet al. (1989) Proc. Natl. Acad. Sci. USA 86:5673-5677; Loh et al. (1989)Science 243:217-220). Products generated by the 3′ and 5′ RACEprocedures can be combined to generate full-length cDNAs (Frohman andMartin (1989) Techniques 1:165). Consequently, a polynucleotidecomprising a nucleotide sequence of at least 60 (preferably at least 40,most preferably at least 30) contiguous nucleotides derived from anucleotide sequence selected from the group consisting of SEQ ID NOS: 1,3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39,41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, and 72and the complement of such nucleotide sequences may be used in suchmethods to obtain a nucleic acid fragment encoding a substantial portionof an amino acid sequence of a polypeptide.

The present invention relates to a method of obtaining a nucleic acidfragment encoding a substantial portion of a fertilization-independentendosperm polypeptide, comprising the steps of: synthesizing anoligonucleotide primer comprising a nucleotide sequence of at least 60(preferably at least 40, most preferably at least 30) 10 contiguousnucleotides derived from a nucleotide sequence selected from the groupconsisting of SEQ ID NOS: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25,27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61,63, 65, 67, 69, 71 and 72, and the complement of such nucleotidesequences; and amplifying a nucleic acid fragment (preferably a cDNAinserted in a cloning vector) using the oligonucleotide primer. Theamplified nucleic acid fragment preferably will encode a portion of afertilization-independent endosperm polypeptide.

Availability of the instant nucleotide and deduced amino acid sequencesfacilitates immunological screening of cDNA expression libraries.Synthetic peptides representing portions of the instant amino acidsequences may be synthesized. These peptides can be used to immunizeanimals to produce polyclonal or monoclonal antibodies with specificityfor peptides or proteins comprising the amino acid sequences. Theseantibodies can be then be used to screen cDNA expression libraries toisolate full-length cDNA clones of interest (Lerner (1984) Adv. Immunol.36:1-34; Maniatis).

In another embodiment, this invention concerns viruses and host cellscomprising either the chimeric genes of the invention as describedherein or an isolated polynucleotide of the invention as describedherein. Examples of host cells which can be used to practice theinvention include, but are not limited to, yeast, bacteria, and plants.

As was noted above, the nucleic acid fragments of the instant inventionmay be used to create transgenic plants in which the disclosedpolypeptides are present at higher or lower levels than normal or incell types or developmental stages in which they are not normally found.This would have the effect of altering endosperm and/or embryo formationin those plants.

Overexpression of the proteins of the instant invention may beaccomplished by first constructing a chimeric gene in which the codingregion is operably linked to a promoter capable of directing expressionof a gene in the desired tissues at the desired stage of development.The chimeric gene may comprise promoter sequences and translation leadersequences derived from the same genes. 3′ non-coding sequences encodingtranscription termination signals may also be provided. The instantchimeric gene may also comprise one or more introns in order tofacilitate gene expression.

Plasmid vectors comprising the instant isolated polynucleotide (orchimeric gene) may be constructed. The choice of plasmid vector isdependent upon the method that will be used to transform host plants.The skilled artisan is well aware of the genetic elements that must bepresent on the plasmid vector in order to successfully transform, selectand propagate host cells containing the chimeric gene. The skilledartisan will also recognize that different independent transformationevents will result in different levels and patterns of expression (Joneset al. (1985) EMBO J. 4:2411-2418; De Almeida et al. (1989) Mol. Gen.Genetics 218:78-86), and thus that multiple events must be screened inorder to obtain lines displaying the desired expression level andpattern. Such screening may be accomplished by Southern analysis of DNA,Northern analysis of mRNA expression, Western analysis of proteinexpression, or phenotypic analysis.

For some applications it may be useful to direct the instantpolypeptides to different cellular compartments, or to facilitate theirsecretion from the cell. It is thus envisioned that the chimeric genedescribed above may be further supplemented by directing the codingsequence to encode the instant polypeptides with appropriateintracellular targeting sequences such as transit sequences (Keegstra(1989) Cell 56:247-253), signal sequences or sequences encodingendoplasmic reticulum localization (Chrispeels (1991) Ann. Rev. PlantPhys. Plant Mol. Biol. 42:21-53) or nuclear localization signals(Raikhel (1992) Plant Phys. 100:1627-1632) with or without removingtargeting sequences that are already present. While the references citedgive examples of each of these, the list is not exhaustive and moretargeting signals of use may be discovered in the future.

It may also be desirable to reduce or eliminate expression of genesencoding the instant polypeptides in plants for some applications. Inorder to accomplish this, a chimeric gene designed for co-suppression ofthe instant polypeptide can be constructed by linking a gene or genefragment encoding that polypeptide to plant promoter sequences.Alternatively, a chimeric gene designed to express antisense RNA for allor part of the instant nucleic acid fragment can be constructed bylinking the gene or gene fragment in reverse orientation to plantpromoter sequences. Either the co-suppression or antisense chimericgenes could be introduced into plants via transformation whereinexpression of the corresponding endogenous genes is reduced oreliminated.

Molecular genetic solutions to the generation of plants with alteredgene expression have a decided advantage over more traditional plantbreeding approaches. Changes in plant phenotypes can be produced byspecifically inhibiting expression of one or more genes by antisenseinhibition or co-suppression (U.S. Pat. Nos. 5,190,931, 5,107,065, and5,283,323), by formation of double-stranded RNA (InternationalPublication Number WO 99/53050; Smith et al., Nature 407:319-320(2000)), and through other methods known to those of skill in the art.

An antisense, co-suppression, or dsRNA construct would act as a dominantnegative regulator of gene activity. While conventional mutations canyield negative regulation of gene activity, these effects are mostlikely recessive. The dominant negative regulation available with atransgenic approach may be advantageous from a breeding perspective. Inaddition, the ability to restrict the expression of a specific phenotypeto the reproductive tissues of the plant by the use of tissue-specificpromoters may confer agronomic advantages relative to conventionalmutations which may have an effect in all tissues in which a mutant geneis ordinarily expressed.

The person skilled in the art will know that special considerations areassociated with the use of antisense or cosuppression technologies inorder to reduce expression of particular genes. For example, the properlevel of expression of sense or antisense genes may require the use ofdifferent chimeric genes utilizing different regulatory elements knownto the skilled artisan. Once transgenic plants are obtained by one ofthe methods described above, it will be necessary to screen individualtransgenics for those that most effectively display the desiredphenotype. Accordingly, the skilled artisan will develop methods forscreening large numbers of transformants. The nature of these screenswill generally be chosen on practical grounds. For example, one canscreen by looking for changes in gene expression by using antibodiesspecific for the protein encoded by the gene being suppressed, or onecould establish assays that specifically measure enzyme activity. Apreferred method will be one which allows large numbers of samples to beprocessed rapidly, since it will be expected that a large number oftransformants will be negative for the desired phenotype.

In another embodiment, the present invention concerns a polypeptide thathas at least 80% identity, based on the GAP (GCG Version 10) method ofalignment, to a polypeptide selected from the group consisting of SEQ IDNOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70.

The instant polypeptides (or portions thereof) may be produced inheterologous host cells, particularly in the cells of microbial hosts,and can be used to prepare antibodies to these proteins by methods wellknown to those skilled in the art. The antibodies are useful fordetecting the polypeptides of the instant invention in situ in cells orin vitro in cell extracts. Preferred heterologous host cells forproduction of the instant polypeptides are microbial hosts. Microbialexpression systems and expression vectors containing regulatorysequences that direct high level expression of foreign proteins are wellknown to those skilled in the art. Any of these could be used toconstruct a chimeric gene for production of the instant polypeptides.This chimeric gene could then be introduced into appropriatemicroorganisms via transformation to provide high level expression ofthe encoded reproduction proteins. An example of a vector for high levelexpression of the instant polypeptides in a bacterial host is provided(Example 16).

All or a substantial portion of the polynucleotides of the instantinvention may also be used as probes for genetically and physicallymapping the genes that they are a part of, and used as markers fortraits linked to those genes. Such information may be useful in plantbreeding in order to develop lines with desired phenotypes. For example,the instant nucleic acid fragments may be used as restriction fragmentlength polymorphism (RFLP) markers. Southern blots (Maniatis) ofrestriction-digested plant genomic DNA may be probed with the nucleicacid fragments of the instant invention. The resulting banding patternsmay then be subjected to genetic analyses using computer programs suchas MapMaker (Lander et al. (1987) Genomics 1:174-181) in order toconstruct a genetic map. In addition, the nucleic acid fragments of theinstant invention may be used to probe Southern blots containingrestriction endonuclease-treated genomic DNAs of a set of individualsrepresenting parent and progeny of a defined genetic cross. Segregationof the DNA polymorphisms is noted and used to calculate the position ofthe instant nucleic acid sequence in the genetic map previously obtainedusing this population (Botstein et al. (1980) Am. J. Hum. Genet.32:314-331).

The production and use of plant gene-derived probes for use in geneticmapping is described in Bematzky and Tanksley (1986) Plant Mol. Biol.Reporter 4:37-41. Numerous publications describe genetic mapping ofspecific cDNA clones using the methodology outlined above or variationsthereof. For example, F2 intercross populations, backcross populations,randomly mated populations, near isogenic lines, and other sets ofindividuals may be used for mapping. Such methodologies are well knownto those skilled in the art.

Nucleic acid probes derived from the instant nucleic acid sequences mayalso be used for physical mapping (i.e., placement of sequences onphysical maps; see Hoheisel et al. In: Nonmammalian Genomic Analysis: APractical Guide, Academic press 1996, pp. 319-346, and references citedtherein).

In another embodiment, nucleic acid probes derived from the instantnucleic acid sequences may be used in direct fluorescence in situhybridization (FISH) mapping (Trask (1991) Trends Genet. 7:149-154).Although current methods of FISH mapping favor use of large clones(several to several hundred kilobases; see Laan et al. (1995) GenomeRes. 5:13-20), improvements in sensitivity may allow performance of FISHmapping using shorter probes.

A variety of nucleic acid amplification-based methods of genetic andphysical mapping may be carried out using the instant nucleic acidsequences. Examples include allele-specific amplification (Kazazian(1989) J. Lab. Clin. Med. 11:95-96), polymorphism of PCR-amplifiedfragments (CAPS; Sheffield et al. (1993) Genomics 16:325-332),allele-specific ligation (Landegren et al. (1988) Science241:1077-1080), nucleotide extension reactions (Sokolov (1990) NucleicAcid Res. 18:3671), Radiation Hybrid Mapping (Walter et al. (1997) Nat.Genet. 7:22-28) and Happy Mapping (Dear and Cook (1989) Nucleic AcidRes. 17:6795-6807). For these methods, the sequence of a nucleic acidfragment is used to design and produce primer pairs for use in theamplification reaction or in primer extension reactions. The design ofsuch primers is well known to those skilled in the art. In methodsemploying PCR-based genetic mapping, it may be necessary to identify DNAsequence differences between the parents of the mapping cross in theregion corresponding to the instant nucleic acid sequence. This,however, is generally not necessary for mapping methods.

Loss-of-function mutant phenotypes may be identified for the instantcDNA clones either by targeted gene disruption protocols or byidentifying specific mutants for these genes contained in a maizepopulation carrying mutations in all possible genes (Ballinger andBenzer (1989) Proc. Natl. Acad. Sci USA 86:9402-9406; Koes et al. (1995)Proc. Natl. Acad. Sci USA 92:8149-8153; Bensen et al. (1995) Plant Cell7:75-84). The latter approach may be accomplished in two ways. First,short segments of the instant nucleic acid fragments may be used inpolymerase chain reaction protocols in conjunction with a mutation tagsequence primer on DNAs prepared from a population of plants in whichMutator transposons or some other mutation-causing DNA element has beenintroduced (see Bensen, supra). The amplification of a specific DNAfragment with these primers indicates the insertion of the mutation tagelement in or near the plant gene encoding the instant polypeptides.Alternatively, the instant nucleic acid fragment may be used as ahybridization probe against PCR amplification products generated fromthe mutation population using the mutation tag sequence primer inconjunction with an arbitrary genomic site primer, such as that for arestriction enzyme site-anchored synthetic adaptor. With either method,a plant containing a mutation in the endogenous gene encoding theinstant polypeptides can be identified and obtained. This mutant plantcan then be used to determine or confirm the natural function of theinstant polypeptides disclosed herein.

The Trait Utility System for Corn (TUSC) is a method that employsgenetic and molecular techniques to facilitate the study of genefunction in maize. Studying gene function implies that the gene'ssequence is already known, thus the method works in reverse: fromsequence to phenotype. This kind of application is referred to as“reverse genetics”, which contrasts with “forward” methods that aredesigned to identify and isolate the gene(s) responsible for aparticular trait (phenotype). One of skill in the art could readilyconceive of use of this procedure with the sequences disclosed in thecurrent application.

Pioneer Hi-Bred International, Inc., has a proprietary collection ofmaize genomic DNA from approximately 42,000 individual F₁ plants(Reverse genetics for maize, Meeley, R. and Briggs, S., 1995, MaizeGenet. Coop. Newslett. 69:67, 82). The genome of each of theseindividuals contains multiple copies of the transposable element family,Mutator (Mu). The Mu family is highly mutagenic; in the presence of theactive element Mu-DR, these elements transpose throughout the genome,inserting into genic regions, and often disrupting gene function. Bycollecting genomic DNA from a large number (42,000) of individuals,Pioneer has assembled a library of the mutagenized maize genome.

Mu insertion events are predominantly heterozygous; given the recessivenature of most insertional mutations, the F₁ plants appear wild-type.Each of the F₁ plants is selfed to produce F₂ seed, which is collected.In generating the F₂ progeny, insertional mutations segregate in aMendelian fashion so are useful for investigating a mutant allele'seffect on the phenotype. The TUSC system has been successfully used by anumber of laboratories to identify the function of a variety of genes(Cloning and characterization of the maize An1 gene, Bensen, R. J., etal., 1995, Plant Cell 7:75-84; Diversification of C-function activity inmaize flower development, Mena, M., et al., 1996, Science 274:1537-1540;Analysis of a chemical plant defense mechanism in grasses, Frey, M., etal., 1997, Science 277:696-699; The control of maize spikelet meristemfate by the APETALA2-like gene Indeterminate spikelet 1, Chuck, G.,Meeley, R. B., and Hake, S., 1998, Genes & Development 12:1145-1154; ASecY homologue is required for the elaboration of the chloroplastthylakoid membrane and for normal chloroplast gene expression, Roy, L.M. and Barkan, A., 1998, J. Cell Biol. 141:1-11).

The disclosure of each reference set forth herein is incorporated hereinby reference in its entirety.

EXAMPLES

The present invention is further defined in the following Examples, inwhich parts and percentages are by weight and degrees are Celsius,unless otherwise stated. It should be understood that these Examples,while indicating preferred embodiments of the invention, are given byway of illustration only and not by way of limitation.

From the above discussion and these Examples, one skilled in the art canascertain the essential characteristics of this invention, and withoutdeparting from the spirit and scope thereof, can make various changesand modifications of the invention to adapt it to various usages andconditions. Thus, various modifications of the invention in addition tothose shown and described herein will be apparent to those skilled inthe art from the foregoing description. Such modifications are alsointended to fall within the scope of the appended claims.

Example 1 Composition of cDNA Libraries: Isolation and Sequencing ofcDNA Clones

cDNA libraries representing mRNAs from various catalpa, maize,eucalyptus, rice, soybean, sunflower and wheat tissues were prepared.The characteristics of the source tissues are described below in Table2. TABLE 2 cDNA Libraries from Catalpa, Maize, Eucalyptus, Rice,Soybean, Sunflower and Wheat Library Tissue Clone ccase-b Maize callus,somatic embryo ccase- formed b.pk0026.g4 cen1 Maize endosperm 10 to 11cen1.mn0001.g10 days after pollination cen3n Maize endosperm 20 dayscen3n.pk0076.b8 after pollination* cpb1c Maize pooled BMS treated withcpb1c.pk001.d10 chemicals related to Ca⁺⁺ channel** eec1c Eucalyptustereticornis eec1c.pk003.e23 capsules (older flowers, lost stamens,possibly fertilized) from adult tree hlp1c Helianthus sp. leaf infectedhlp1c.pk003.e8 with phomopsis ncs Catalpa speciosa developingncs.pk0019.h3 seed p0003 Maize premeiotic ear shoot, p0003.cgped29rb0.2-4 cm p0003.cgpfn34f p0003.cgpfn34rb p0037 Maize V5 stage*** rootsp0037.crwao47r infested with corn root worm p0041 Maize root tipssmaller than p0041.crtaw93r 5 mm in length four days after imbibitionp0101 Maize embryo sacs 4 days p0101.cgamg48r after pollination* p0104Maize roots V5, corn root p0104.cabbn62r worm infested* p0107 Maizewhole kernels 7 days p0107.cbcai79r after pollination* p0119 Maize V12stage*** ear shoot p0119.cmtoh49r with husk, night harvested* p0120Pooled endosperm: 18, 21, 24, p0120.cdebd48r 27 and 29 days afterpollination* rcal1c Rice nipponbare callus rcal1c.pk0001.d2 ses2wSoybean embryogenic ses2w.pk0015.b10 suspension 2 weeks after subculturewkm1c Wheat kernel malted 55 wkm1c.pk0003.f4 hours at 22 degrees Celsius*These libraries were normalized essentially as described in U.S. Pat.No. 5,482,845, incorporated herein by reference.**Chemicals used included caffeine, BHQ, cyclopiazonic acid, nifedipine,verapamil, fluphenizine-N-2-chloroethane, calmidazoilum chloride.***Maize developmental stages are explained in the publication “How acorn plant develops” from the Iowa State University Coop. Ext. ServiceSpecial Report No. 48 reprinted June 1993.

cDNA libraries may be prepared by any one of many methods available. Forexample, the cDNAs may be introduced into plasmid vectors by firstpreparing the cDNA libraries in Uni-ZAP™ XR vectors according to themanufacturer's protocol (Stratagene Cloning Systems, La Jolla, Calif.).The Uni-ZAP™ XR libraries are converted into plasmid libraries accordingto the protocol provided by Stratagene. Upon conversion, cDNA insertswill be contained in the plasmid vector pBluescript. In addition, thecDNAs may be introduced directly into precut Bluescript II SK(+) vectors(Stratagene) using T4 DNA ligase (New England Biolabs), followed bytransfection into DH10B cells according to the manufacturer's protocol(GIBCO BRL Products). Once the cDNA inserts are in plasmid vectors,plasmid DNAs are prepared from randomly picked bacterial coloniescontaining recombinant pBluescript plasmids, or the insert cDNAsequences are amplified via polymerase chain reaction using primersspecific for vector sequences flanking the inserted cDNA sequences.Amplified insert DNAs or plasmid DNAs are sequenced in dye-primersequencing reactions to generate partial cDNA sequences (expressedsequence tags or “ESTs”; see Adams et al., (1991) Science252:1651-1656). The resulting ESTs are analyzed using a Perkin ElmerModel 377 fluorescent sequencer.

Example 2 Identification of cDNA Clones

The cDNA sequences obtained in Example 1 were analyzed for similarity toall publicly available DNA sequences contained in the “nr” databaseusing the BLASTN algorithm (Basic Local Alignment Search Tool; Altschulet al. (1993) J. Mol. Biol. 215:403-410) provided by the National Centerfor Biotechnology Information (NCBI; see www.ncbi.nlm.nih.gov/BLAST/).

The DNA sequences were also translated in all reading frames andcompared for similarity to all publicly available protein sequencescontained in the “nr” database (comprising all non-redundant GenBank CDStranslations, sequences derived from the 3-dimensional structureBrookhaven Protein Data Bank, the last major release of the SWISS-PROTprotein sequence database, EMBL, and DDBJ databases) using the BLASTXalgorithm (Gish and States (1993) Nat. Genet. 3:266-272) provided by theNCBI.

For convenience, the P-value (probability) of observing a match of acDNA sequence to a sequence contained in the searched databases merelyby chance as calculated by BLAST is reported herein as a “pLog” value,which represents the negative of the logarithm of the reported P-value.Accordingly, the greater the pLog value, the greater the likelihood thatthe cDNA sequence and the BLAST “hit” represent homologous proteins.

Abbreviations which may be used in describing the sequences listed inthe following tables include:

-   -   EST—individual Expressed Sequence Tag    -   FIS—Full Insert Sequence; the entire cDNA insert comprising the        indicated EST    -   Contig—an assembly of two or more contiguous ESTs    -   Contig+—a contig comprising an FIS and one or more ESTs    -   CGS—Complete Gene Sequence; a sequence encoding an entire        protein, derived from one or more of the above DNA segments; may        be determined in combination with PCR

Example 3 Characterization of cDNA EST Clones EncodingFertilization-Independent Endosperm Protein

The BLASTX search using the EST sequences of clones listed in Table 1revealed similarity of the polypeptides encoded by the cDNAs tofertilization-independent endosperm protein from Arabidopsis thaliana(NCBI Identifier No. gi 4567095). Scores, on a pLog basis, ranged from18.0 to 89.7, with an average score of 50.3.

Example 4 Characterization of cDNA FIS and CGS Clones EncodingFertilization-Independent Endosperm Protein

The sequence of the entire cDNA insert (FIS) in each of the cloneslisted in Table 3 was determined. Further sequencing and searching ofthe DuPont proprietary database allowed the identification of othermaize, rice, soybean, wheat, eucalyptus, sunflower, and catalpa clonesencoding fertilization-independent endosperm proteins. A BLASTX searchusing the full insert sequences and complete gene sequences listed inTable 1 revealed similarity of the polypeptides encoded by these cDNAsto fertilization-independent endosperm protein from Arabidopsis thaliana(NCBI Identifier No. gi 4567095). Scores, on a pLog basis, averaged 57.4for Full Insert Sequences and 150.5 for Complete Gene Sequences.

Example 5

The amino acid sequences set forth in SEQ ID NOS: 2, 4, 6, 8, 10, 12,14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48,50, 52, 54, 56, 58, 60, 62, 64, 66, 68 and 70 were compared to theArabidopsis thaliana sequence gi4567095 using the Megalign program ofthe LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison,Wis.). Multiple alignment of the sequences was performed using theClustal method of alignment (Higgins and Sharp (1989) CABIOS. 5:151-153)with the default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10).Default parameters for pairwise alignments using the Clustal method wereKTUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. Sequencealignments and BLAST scores and probabilities indicated that the nucleicacid fragments comprising the instant cDNA clones encoded a substantialportion of a fertilization-independent endosperm protein. Thesesequences represent the first catalpa, eucalyptus, maize, rice, soybean,sunflower and wheat sequences encoding fertilization-independentendosperm proteins known to Applicant.

Example 6 Mapping and Isolation of Genomic Sequences of FIE-A and FIE-B

ZmFIE-A (also referred to as ZmFIE1) maps to Chromosome 4 (bin 4.04) andZmFIE-B (also referred to as ZmFIE2) maps to Chromosome 10 (bin 10.03).Map positions were identified by a standard procedure using RFLPanalysis of a mapping population (Davis et al., Genetics (1999)152:1137-1172).

To obtain genomic copies of Zm FIE genes, BAC (Bacterial ArtificialChromosome) libraries were used. BAC libraries were constructedaccording to the Texas A&M BAC Center protocol(http://hbz.tamu.edu/bacindex.html). High-molecular-weight DNA isolatedfrom line Mo17, embedded in LMP agarose microbeads, was partiallydigested by HindIII. The DNA was then size-selected by pulsed-field gelelectrophoresis to remove the smaller DNA fragments that can competemore effectively than the larger DNA fragments for vector ends. Thesize-selected DNA fragments were ligated into pBeloBAC11 at the HindIIIsite. BAC libraries were screened by hybridization with ³²P-labeledprobes (Maniatis). SEQ ID NO: 1 and SEQ ID NO: 29 correspond to ZmFIE-Band ZmFIE-A ESTs. BAC DNAs were isolated, subcloned into BluescriptII(SK+) vector (Stratagene), and sequenced.

The genomic sequences of the maize and arabidopsis FIE genes show a highdegree of conservation of intron/exon structure. There are 13 exons withalmost identical lengths (with the accuracy of BestFit program, GCG) inthe maize and Arabidopsis genes, with exceptions of 5′ and 3′ UTRs. Thishigh degree of conservation between FIE genes in monocots and dicotssuggests that gene function is under strong evolutionary pressure. Thegenomic structure of the ZmFIE-A gene is different from the ZmFIE-B andarabidopsis genes by 1 intron of 385 nt length, which is positionedwithin the 5′UTR, 6 nt upstream of the ATG codon. Introns located in the5′ UTR are important for tissue-specific expression of the genes(McElroy et al.(1991) Molecular & General Genetics 231:150-160). As isshown in Example 7, ZmFIE-A expression occurs mostly in developingendosperm; this regulation may be achieved through splicing of the 5′UTRintron. TABLE 3 The exon lengths (in bp) of the maize and arabidopsisFIE genes 1 13 5′ UTR 2 3 4 5 6 7 8 9 10 11 12 3′ UTR ZmFIE-A 340 66 12583 96 75 84 71 62 98 65 59 347 ZmFIE-B 509 66 125 83 96 75 84 71 62 9865 59 230 AtFIE 375 66 122 83 96 75 84 71 62 106 57 59 317

Example 7 Analysis of Expression of FIE-A and FIE-B by RT-PCR

To determine ZmFIE expression patterns, RNA was extracted from differenttissues and RT-PCR was performed using ZmFIE-A- and ZmFIE-B-specificprimers.

With the exception of pollen, ZmFIE-B is expressed in all tissuesexamined, including leaf, immature leaf, tassel, stem, silk, 3-day roottissue, ovules before pollination, and in whole-kernel, endosperm, andembryo tissues at 11 days after pollination (DAP). Pollen is the onlytissue where ZmFIE-B gene expression is very low. It is very likely thatZmFIE-B expression is repressed in the sperm nuclei, but that the geneis still active in the vegetative nucleus of the pollen.

Conversely, ZmFIE-A is expressed only in kernels after pollination. Noneof the vegetative tissues has a detectable level of the ZmFIE-Atranscripts. ZmFIE-A also is not expressed in mature pollen.

In a time-course comparison of ZmFIE-A and ZmFIE-B expression, wholekernels were collected at intervals after pollination and RT-PCR wasperformed. ZmFIE-A mRNA was first detected at about 9 days afterpollination (DAP), peaked at about 11 DAP, and was markedly reducedafter about 20 DAP. ZmFIE-B was expressed at a consistent level duringthe time tested, from 3 DAP to 25 DAP. These results were confirmed byNorthern hybridization of the poly-A RNA extracted from the same set oftissues.

Example 8 Analysis of Expression of FIE-A and FIE-B by Lynx MPSS™

To further refine analysis of expression of FIE-A and FIE-B, Lynx MPSS™(massively parallel signature sequencing) experiments were used forBLAST searching of the 17-mer tags expressed in various tissues. (For adescription of Lynx technology, see www.lynxgen.com or NatureBiotechnology (2000) 18:630-634.) In complete agreement with RT-PCR andNorthern results (Example 7), 17-mer tags of ZmFIE-A transcripts werenot detected in ovules before pollination, but were detected in theendosperm of developing kernels after pollination, rapidly reaching apeak at about 8 to 9 days after pollination (DAP), then diminishing toreach the basal level at about 30 DAP. A very low level of ZmFIE-A tagswas found in the embryo. These results provide strong evidence that theZmFIE-A gene is expressed specifically in endosperm after fertilization.Expression of the ZmFIE-B gene cannot be detected by Lynx technologybecause the ZmFIE-B gene is lacking the GATC restriction site used increating 17-mer tags.

Example 9 In Situ Localization of FIE mRNA in Ovules and DevelopingKernels

To further determine expression patterns of ZmFIE genes in maize, insitu hybridization was performed using the protocol of Jackson, D. P.(1991) (In situ Hybridization in Plants, Molecular Plant Pathology: APractical Approach, D. J. Bowles, S. J. Gurr, and M. McPherson, eds.;Oxford University Press, England, pp. 63-74). Sense and antisense mRNAprobes of about 0.9 kb corresponding to FIE genes were labelednon-isotopically with digoxigenin and incubated with fixed sections ofmaize tissues from ovules at silking and from kernels at 5, 8 and 12days after pollination (DAP). FIE-A hybridization was performed onlywith ovules and kernels at 5 DAP. Following extensive washing to removeunbound probe, sections were incubated with anti-digoxigenin alkalinephosphatase to detect areas of probe hybridization. FIE mRNA wasdetected specifically with the antisense probe; the sense probe did nothybridize, therefore serving as a negative control.

FIE antisense probes gave a signal in the embryo sac of the matureovules at silking. The signal within the embryo sac before fertilizationis likely due to ZmFIE-B mRNA, because RT-PCR and Lynx data do not showa detectable level of ZmFIE-A gene expression in ovules beforefertilization. In kernels at 2 to 5 DAP, the most intense signalappeared in the embryo-surrounding region and on the periphery of thedeveloping endosperm. At the later stages (8, 10, or 15 DAP), the signalpersists at the embryo, but is not detectable in the endosperm usingFIE-B probe. An in situ experiment with ZmFIE-A was not performed atthese stages.

FIE proteins belong to the Polycomb group (PcG) proteins, which areinvolved in multiple aspects of embryogenesis in Drosophila and mammals.PcG proteins appear to have a conserved role in the zygotic control ofthe development of the anterior-posterior axis. The arabidopsis FIEprotein plays a pleiotropic role as a repressor of endosperm developmentbefore pollination, a regulator of the establishment of theanterior-posterior axis in the endosperm, and a factor of the embryodevelopment.

The differential pattern of expression of the ZmFIE genes argues thatfunctions of the maize FIE genes are separated in evolution. The ZmFIE-Bgene may play a role as a repressor of seed development beforepollination in the embryo sac, and as a regulator of theanterior-posterior axis in the developing embryo. The ZmFIE-A gene,induced after pollination and expressed only in the endosperm, may playa role as a regulator of the establishment of the anterior-posterioraxis in the endosperm.

One could expect that inactivation of ZmFIE-B function would result inseed development without fertilization (apomixis), but that inactivationof the ZmFIE-A gene would interfere with endosperm development.

Example 10 Isolation and Identification of the Promoter Regions of FIE-Aand FIE-B

5.5 kb of the FIE-A upstream region and 6.0 kb of the ZmFIE-B upstreamregion were sequenced from the BAC genomic clones (Example 6).

ZmFIE-A 5′ Upstream Region (SEQ ID NO: 73)

The 5′ upstream region of the ZmFIE-A gene shares sequence homology withthe 5′ LTR (long terminal repeat) of the retrotransposon RIRE-1 (GenBankaccession # D85597), at positions 2984-3378. Retrotransposable elementsare landmarks of the intragenic regions in the maize genome (SanMiguelet al. (1996) Science 274:765-768). Sequence homology toretrotransposons indicates the border of the gene-specific region.According to this definition, the sequence downstream of 3378 nt(nucleotide/s) may be considered as a part of the ZmFIE-A gene. The RNAstartpoint is at 4159, as shown by an alignment with the longest EST,cgamg48. Taking these reference points, the basal promoter is locatedbetween 3378-4159 nt and is 781 nt long. No repeats or secondarystructures are found in the ZmFIE-A basal promoter. There is an intron386 nt long at position 4319-4705. The intron sequence is present ingenomic DNA, but is absent in the cDNA (cgamg48). The intron ispositioned just 6 nt upstream from the translation start codon ATG at4712 nt. This intron may play a regulatory role in ZmFIE-A geneexpression, for example, providing the properly spliced RNA only inkernels after fertilization.

ZmFIE-B 5′ Upstream Region (SEQ ID NO: 74)

The size of the ZmFIE-B promoter is estimated to be about 6 kb from thetranslation start codon ATG to the point of homology with theretrotransposon Milt1 that might be considered as a landmark of theintragenic region. This 6 kb region is a unique sequence with no knownhomology in the published databases and shows a pattern of repetitivesequences.

The sequence from 2919 to 5237 nt (nucleotides) consists of two types ofrepeats, named A and B, and a spacer (see FIG. 1). Repeats are organizedin the following order: A₁-B₁ spacer B₂-A₂. Repeats A₁ and A₂ are 583 ntlong and share 95% homology. Repeats B₁ and B₂ are 348 nt long and share93% homology. The spacer size is 410 nt. Repeats and a spacer form the2.3 kb region. The B₁ spacer sequence, C, is repeated again from 321 to1070 nt of the 5′ upstream region of ZmFIE-B.

A pattern of perfect direct repeats argues for their functionalsignificance. Expression of ZmFIE-B is constitutive and nottissue-specific. The only specific feature of this gene is therepression of the paternal allele during early kernel development(Example 11; also see Lai J. and Messing J., 2001, 43^(rd) MaizeGenetics Conference, Abstract P39, page 57). This phenomenon is termedparental imprinting and has been shown for the Arabidopsis FIE gene(Ohad et al., PNAS 93:5319-5324 (1996); Luo et al., PNAS 97:10637-10642(2000)). In mammals, the imprinting control region (ICR) has beenidentified as a 2 kb region located from −2 to −4 kb relative to thetranscription start of the imprinted genes (Thorvaldsen et al. (1998)Genes and Development 12:3693-3702). The ICR (or the DMD, thedifferentially methylated domain) regulates imprinting by DNAmethylation.

The repetitive structure found upstream of the ZmFIE-B gene may beresponsible for imprinting of the ZmFIE-B gene and is being termed theICE (Imprinting Control Element, to distinguish from the animal ICR). Todetermine whether the ICE is required for imprinted expression ofZmFIE-B gene, expression cassettes can be constructed directingexpression of the reporter genes with and without fusion with the ICE.If the ICE is required for imprinting, the parent-of-origin expressionof the reporter constructs will be observed.

One of skill in the art would recognize that the ICE may provide a toolfor the modification of gene expression in developing kernels and couldbe used as a tool in modifying or controlling imprinting. The ICE may bea target for DNA methylation like the DMD (ICR) in mammals, or the ICEmay be a binding site for specific proteins. Protein-mediated mechanismof the imprinting seems more likely, because frequency of the DNAmethylation sites CpG and CpNpG is reduced to about 0.5-1% in the ICEand overall 5′ upstream region of the ZmFIE-B gene; equal distributionof di- and tri-nucleotides along DNA sequences predicts a frequency of6%. The ICE may be used as a binding target for proteins regulating geneexpression by imprinting.

Example 11 Monitoring of Parent-of-Origin Expression by Allele-SpecificPrimers

As described in Example 10, ZmFIE-B expression varies with the parent oforigin. Only the maternal allele is expressed immediately followingpollination; expression of the paternal allele resumes after 10 DAP.This phenomenon, termed imprinting, is mediated by direct repeats (theICE, Imprinting Control Element) positioned upstream of the ZmFIE-Bcoding region (Example 10).

Inbreds B73 and Mo17 comprise polymorphisms which aid in monitoringparent-of-origin expression. The differences lie in the genomicfragments in the vicinity of the stop codon of the ZmFIE-B gene.

The B73 genomic sequence (SEQ ID NO: 75) contains a 185-nt insertionwith 13-nt terminal inverted repeats. The insertion is flanked by 5-ntdirect repeats, which result from a target duplication, providing strongevidence for the transposition origin of the insertion. The insertion isa typical example of so-called MITE elements, which are very abundantcomponents of the maize genome (Wessler, S. R. Plant Physiol. (2001)125(1):149-151). In the B73 background, ZmFIE-B polyA transcripts areterminated in the middle of the MITE element.

In the Mo17 background, ZmFIE-B polyA transcripts are terminated withingenomic sequence with no homology to the MITE element.

Thus, the MITE element was used to design primers specific for B73 orMo17 ZmFIE-B transcripts. The forward primer, CGTGAAGGCAAAATCTACGTGTGG(SEQ ID NO: 76), is common to both genotypes. The reverse primers aregenotype specific. A reverse primer CATTACGTTACAAATATGTGAACCAAACG (SEQID NO: 77) amplifies transcripts only from the B73 gene in an RT-PCRreaction. A reverse primer CAGMCAAACAGATGACMCGGTTCCCAAAG (SEQ ID NO: 78)amplifies transcripts only from the Mo17 gene in an RT-PCR reaction.This primer combination allows monitoring of the paternal and maternalZmFIE-B allele expression. RT-PCR reactions were conducted at variousDAP time intervals in B73/Mo17 reciprocal crosses. The maternal ZmFIE-Ballele (either B73 or Mo17) is expressed immediately followingpollination and continuing through the full 16 days tested. Whereas thepaternal ZmFIE-B allele (either Mo17 or B73) is expressed beginning atapproximately 10 days after pollination and continuing through the full16 days tested.

Example 12 Construction of FIE-Null Genetic Backgrounds and Inactivationof ZmFIE Genes by the Mutator Transposon Insertions (TUSC)

Gene inactivation can be used to determine the function of ZmFIE genesin the regulation of endosperm development. When fertilization isprevented in Arabidopsis plants heterozygous for fie mutant alleles,siliques nevertheless elongate and contain seed-like structures due topartial endosperm development. No embryo development is observed (Ohad,Yadegari et al. (1999) Plant Cell 11:407415). Maize fie mutants would beexpected to develop endosperm (or kernels) in the absence offertilization (i.e. when immature ears are protected from pollination bybags).

The Pioneer proprietary system TUSC (Trait Utility System for Corn) wasused to screen for FIE genes disrupted by Mutator transposable elementinsertion. F₂ families segregating for the Mutator insertions werescreened by PCR with the Mu-specific primer (SEQ ID NO: 79) and FIE-A orFIE-B gene-specific primers (SEQ ID NOS: 80-82). No positive signalswere found for the Mutator insertions in the ZmFIE-A gene. However, sixMu insertions were identified in the ZmFIE-B gene. The Mu insertionsites were sequenced. Data are shown in the following table: TABLE 4 Muinsertion sites Allele Individual plants in # Allele name TUSC poolsSite of Mu insertion 1 fieb::Mu61E09 PV03 61 E-09 234 nt upstream of ATG2 fieb::Mu25C04 BT94 25 C-04 188 bp upstream of ATG 3 fieb::Mu57B12 PV0357 B12 183 bp upstream of ATG 4 fieb::Mu217 I6A89718 B217 138 bpupstream of ATG 5 fieb::Mu203 I6A80321 B203 138 bp upstream of ATG 6fieb::Mu29A08 BT94 29 A08 4 bp of 1^(st) exon/intron junction

All Mu insertions occurred in non-coding regions of ZmFIE-B. Alleles#1-5 represent the Mu insertions in the 5′ UTR at distances of 138 to234 bp upstream of the translation start codon ATG. Allele #6 carriesthe Mu insertion in the first intron, 4 nucleotides past the exon/intronjunction.

Homozygous plants were obtained for alleles #1-5. Transcription ofZmFIE-B is not affected in the Mu homozygous plants as has been shown byRT-PCR. Those plants do not demonstrate the expected phenotype ofdeveloping endosperm (or kernels) in the absence of fertilization. Oneof the possible explanations for the normal function of ZmFIE-B with theMu upstream insertions is the outward reading promoter in the end of Mu(Barkan and Martienssen (1991) Proc. Natl. Acad. Sci. USA 88:3502-3506).This promoter may support transcription of the fieb::Mu alleles. Nochanges in phenotype were seen as a result of these Mu insertions.

To isolate derivative alleles at the ZmFIE-B locus that no longerrequire Mutator activity and are stable null alleles, the site-selectedtransposon mutagenesis (SSTM) method was used (Plant Cell 7:287-294,1995). The Mu element generates the flanking deletions resulting in nullalleles at frequencies approaching 1% (Taylor and Walbot (1985) EMBO J.4:869-876). To generate flanking deletions at the ZmFIE-B locus, plantshomozygous for fieb::Mu alleles were crossed with the Mu active lineles22 (wherein white necrotic lesions are a marker for the presence ofthe active Mutator; Hu, Yalpani, et al. (1998) Plant Cell 10:1095-1105).The progeny of this cross, Mu-active fieb::Mu/+, were crossed to Mo17inbred to produce seed with the potential Mu-flanking deletions.Screening of the flanking deletions was performed by PCR with the Mu-and fleb-specific primers (see above). DNA was isolated from seedlingleaf punches using Puregene kit (Gentra System, Minneapolis, Minn.)according to the manufacturer's protocol. Initially, four deletions,100-200 nt long, were identified from the fieb::Mu allele #2.

SSTM represents an efficient way to generate stable null alleles fromthe original TUSC material in those cases when Mu insertions occur in“non-coding” neutral regions of the genes. These derivative deletionsprovide the genetic material for phenotypic and cytological analysis todetermine the role of the FIE gene in controlling endosperm developmentin maize.

Example 13 Use of ZmFIE Mutants with Maize CHD to Induce Apomixis

A “CHD polypeptide” refers to a polypeptide containing 3 domains: achromatin organization modifier, a helicase SNF-2 related/ATP domain,and a DNA binding domain. Down-regulation of CHD in transformed maize isexpected to result in a more embryogenic callus phenotype. (See pendingU.S. patent application Ser. No. 60/251,555, filed Dec. 6, 2000.)

Maize expression cassettes down-regulating CHD expression (CHD-DR) inthe inner integument or nucellus can easily be constructed. Anexpression cassette directing expression of the CHD-DR polynucleotide tothe nucellus is made using the barley Nuc1 promoter (See pending U.S.patent application Ser. No. 09/703,754, filed Nov. 1, 2000). Embryos areco-bombarded with the selectable marker PAT fused to the GFP gene(UBI::moPAT˜moGFP) along with the nucellus specific CHD-DR expressioncassette described above. Both inbred (P38) and GS3 transformants areobtained and regenerated as described in Example 14.

When such nuc1:CHD-DR transformation is accomplished in a mutant fiebackground, both de novo embryo development and endosperm developmentwithout fertilization could occur. (see Ohad et al. 1999 The Plant Cell11:407-415). Upon microscopic examination of the developing embryos itwill be apparent that apomixis has occurred by the presence of embryosbudding off the nucellus.

Example 14 Expression of Chimeric Genes in Monocot Cells

A chimeric gene is constructed which comprises a cDNA encoding theinstant polypeptides in sense orientation with respect to the maize 27kD zein promoter located 5′ to the cDNA fragment, and the 10 kD zein 3′end located 3′ to the cDNA fragment. The cDNA fragment of this gene maybe generated by polymerase chain reaction (PCR) of the cDNA clone usingappropriate oligonucleotide primers. Cloning sites (NcoI or SmaI) can beincorporated into the oligonucleotides to provide proper orientation ofthe DNA fragment when inserted into the digested vector pML103 asdescribed below. Amplification is then performed in a standard PCR. Theamplified DNA is then digested with restriction enzymes NcoI and SmaIand fractionated on an agarose gel. The appropriate band can be isolatedfrom the gel and combined with a 4.9 kb NcoI-SmaI fragment of theplasmid pML103. Plasmid pML103 has been deposited under the terms of theBudapest Treaty at ATCC (American Type Culture Collection, 10801University Blvd., Manassas, Va. 20110-2209), and bears accession numberATCC 97366. The DNA segment from pML103 contains a 1.05 kb SalI-NcoIpromoter fragment of the maize 27 kD zein gene and a 0.96 kb SmaI-SalIfragment from the 3′ end of the maize 10 kD zein gene in the vectorpGem9Zf(+) (Promega). Vector and insert DNA can be ligated at 15° C.overnight, essentially as described (Maniatis). The ligated DNA may thenbe used to transform E. coli XL1-Blue (Epicurian Coli XL-1 Blue®;Stratagene). Bacterial transformants can be screened by restrictionenzyme digestion of plasmid DNA and limited nucleotide sequence analysisusing the dideoxy chain termination method (Sequenase® DNA SequencingKit; U.S. Biochemical). The resulting plasmid construct comprises achimeric gene encoding, in the 5′ to 3′ direction, the maize 27 kD zeinpromoter, a cDNA fragment encoding the instant polypeptides, and the 10kD zein 3′ region.

The chimeric gene described above can then be introduced into maizecells by the following procedure. Immature maize embryos can bedissected from developing caryopses derived from crosses of the inbredmaize lines H99 and LH132. The embryos are isolated 10 to 11 days afterpollination when they are 1.0 to 1.5 mm long. The embryos are thenplaced in contact with agarose-solidified N6 medium (Chu et al. (1975)Sci. Sin. Peking 18:659-668), axis-side down. The embryos are kept inthe dark at 27° C. Friable embryogenic callus, consisting ofundifferentiated masses of cells with somatic proembryoids and embryoidsborne on suspensor structures, proliferates from the scutellum of theseimmature embryos. The embryogenic callus isolated from the primaryexplant can be cultured on N6 medium and sub-cultured on this mediumevery 2 to 3 weeks.

The plasmid p35S/Ac (obtained from Dr. Peter Eckes, Hoechst Ag,Frankfurt, Germany) may be used in transformation experiments in orderto provide for a selectable marker. This plasmid contains the Pat gene(see European Patent Publication 0 242 236) which encodesphosphinothricin acetyl transferase (PAT). The enzyme PAT confersresistance to herbicidal glutamine synthetase inhibitors such asphosphinothricin. The pat gene in p35S/Ac is under the control of the35S promoter from Cauliflower Mosaic Virus (Odell et al. (1985) Nature313:810-812) and the 3′ region of the nopaline synthase gene from theT-DNA of the Ti plasmid of Agrobacterium tumefaciens.

The particle bombardment method (Klein et al. (1987) Nature 327:70-73)may be used to transfer genes to the callus culture cells. According tothis method, gold particles (1 μm in diameter) are coated with DNA usingthe following technique: Ten μg of plasmid DNAs are added to 50 μL of asuspension of gold particles (60 mg per mL). Calcium chloride (50 μL ofa 2.5 M solution) and spermidine free base (20 μL of a 1.0 M solution)are added to the particles. The suspension is vortexed during theaddition of these solutions. After 10 minutes, the tubes are brieflycentrifuged (5 sec at 15,000 rpm) and the supernatant removed. Theparticles are resuspended in 200 μL of absolute ethanol, centrifugedagain and the supernatant removed. The ethanol rinse is performed againand the particles resuspended in a final volume of 30 μL of ethanol. Analiquot (5 μL) of the DNA-coated gold particles can be placed in thecenter of a Kapton® flying disc (Bio-Rad Labs). The particles are thenaccelerated into the maize tissue with a Biolistic® PDS-1000/He (Bio-RadInstruments, Hercules Calif.), using a helium pressure of 1000 psi, agap distance of 0.5 cm and a flying distance of 1.0 cm.

For bombardment, the embryogenic tissue is placed on filter paper overagarose-solidified N6 medium. The tissue is arranged as a thin lawncovering a circular area of about 5 cm in diameter. The petri dishcontaining the tissue can be placed in the chamber of the PDS-1000/Heapproximately 8 cm from the stopping screen. The air in the chamber isthen evacuated to a vacuum of 28 inches of Hg. The macrocarrier isaccelerated with a helium shock wave using a rupture membrane thatbursts when the He pressure in the shock tube reaches 1000 psi.

Seven days after bombardment, the tissue can be transferred to N6 mediumthat contains gluphosinate (2 mg per liter) and lacks casein or proline.The tissue continues to grow slowly on this medium. After an additional2 weeks the tissue can be transferred to fresh N6 medium containinggluphosinate. After 6 weeks, areas of about 1 cm in diameter of activelygrowing callus can be identified on some of the plates containing theglufosinate-supplemented medium. These calli may continue to grow whensub-cultured on the selective medium.

Plants can be regenerated from the transgenic callus by firsttransferring clusters of tissue to N6 medium supplemented with 0.2 mgper liter of 2,4-D. After two weeks the tissue can be transferred toregeneration medium (Fromm et al. (1990) Bio/Technology 8:833-839).

Example 15 Expression of Chimeric Genes in Dicot Cells

A seed-specific expression cassette composed of the promoter andtranscription terminator from the gene encoding the β subunit of theseed storage protein phaseolin from the bean Phaseolus vulgaris (Doyleet al. (1986) J. Biol. Chem. 261:9228-9238) can be used for expressionof the instant polypeptides in transformed soybean. The phaseolincassette includes about 500 nucleotides upstream (5′) from thetranslation initiation codon and about 1650 nucleotides downstream (3′)from the translation stop codon of phaseolin. Between the 5′ and 3′regions are the unique restriction endonuclease sites Nco I (whichincludes the ATG translation initiation codon), Sma I, Kpn I and Xba I.The entire cassette is flanked by Hind III sites.

The cDNA fragment of this gene may be generated by polymerase chainreaction (PCR) of the cDNA clone using appropriate oligonucleotideprimers. Cloning sites can be incorporated into the oligonucleotides toprovide proper orientation of the DNA fragment when inserted into theexpression vector. Amplification is then performed as described above,and the isolated fragment is inserted into a pUC18 vector carrying theseed expression cassette.

Soybean embryos may then be transformed with the expression vectorcomprising sequences encoding the instant polypeptides. To inducesomatic embryos, cotyledons, 3-5 mm in length dissected from surfacesterilized, immature seeds of the soybean cultivar A2872, can becultured in the light or dark at 26° C. on an appropriate agar mediumfor 6-10 weeks. Somatic embryos which produce secondary embryos are thenexcised and placed into a suitable liquid medium. After repeatedselection for clusters of somatic embryos which multiplied as early,globular staged embryos, the suspensions are maintained as describedbelow.

Soybean embryogenic suspension cultures can be maintained in 35 mLliquid media on a rotary shaker, 150 rpm, at 26° C. with florescentlights on a 16:8 hour day/night schedule. Cultures are subcultured everytwo weeks by inoculating approximately 35 mg of tissue into 35 mL ofliquid medium.

Soybean embryogenic suspension cultures may then be transformed by themethod of particle gun bombardment (Klein et al. (1987) Nature (London)327:70-73, U.S. Pat. No. 4,945,050). A DuPont Biolistic® PDS1000/HEinstrument (helium retrofit) can be used for these transformations.

A selectable marker gene which can be used to facilitate soybeantransformation is a chimeric gene composed of the 35S promoter fromCauliflower Mosaic Virus (Odell et al. (1985) Nature 313:810-812), thehygromycin phosphotransferase gene from plasmid pJR225 (from E. coli;Gritz et al. (1983) Gene 25:179-188) and the 3′ region of the nopalinesynthase gene from the T-DNA of the Ti plasmid of Agrobacteriumtumefaciens. The seed expression cassette comprising the phaseolin 5′region, the fragment encoding the instant polypeptides and the phaseolin3′ region can be isolated as a restriction fragment. This fragment canthen be inserted into a unique restriction site of the vector carryingthe marker gene.

To 50 μL of a 60 mg/mL 1 μm gold particle suspension are added (inorder): 5 μL DNA (1 μg/μL), 20 μl spermidine (0.1 M), and 50 μL CaCl₂(2.5 M). The particle preparation is then agitated for three minutes,spun in a microfuge for 10 seconds and the supernatant removed. TheDNA-coated particles are then washed once in 400 μL 70% ethanol andresuspended in 40 μL of anhydrous ethanol. The DNA/particle suspensioncan be sonicated three times for one second each. Five μL of theDNA-coated gold particles are then loaded on each macro carrier disk.

Approximately 300-400 mg of a two-week-old suspension culture is placedin an empty 60×15 mm petri dish and the residual liquid removed from thetissue with a pipette. For each transformation experiment, approximately5-10 plates of tissue are normally bombarded. Membrane rupture pressureis set at 1100 psi and the chamber is evacuated to a vacuum of 28 inchesmercury. The tissue is placed approximately 3.5 inches away from theretaining screen and bombarded three times. Following bombardment, thetissue can be divided in half and placed back into liquid and culturedas described above.

Five to seven days post bombardment, the liquid media may be exchangedwith fresh media, and eleven to twelve days post bombardment with freshmedia containing 50 mg/mL hygromycin. This selective media can berefreshed weekly. Seven to eight weeks post bombardment, green,transformed tissue may be observed growing from untransformed, necroticembryogenic clusters. Isolated green tissue is removed and inoculatedinto individual flasks to generate new, clonally propagated, transformedembryogenic suspension cultures. Each new line may be treated as anindependent transformation event. These suspensions can then besubcultured and maintained as clusters of immature embryos orregenerated into whole plants by maturation and germination ofindividual somatic embryos.

Example 16 Expression of Chimeric Genes in Microbial Cells

The cDNAs encoding the instant polypeptides can be inserted into the T7E. coli expression vector pBT430. This vector is a derivative of pET-3a(Rosenberg et al. (1987) Gene 56:125-135; see also www.novagen.com)which employs the bacteriophage T7 RNA polymerase/T7 promoter system.Plasmid pBT430 was constructed by first destroying the EcoR I and HindIII sites in pET-3a at their original positions. An oligonucleotideadaptor containing EcoR I and Hind III sites was inserted at the BamH Isite of pET-3a. This created pET-3aM with additional unique cloningsites for insertion of genes into the expression vector. Then, the Nde Isite at the position of translation initiation was converted to an Nco Isite using oligonucleotide-directed mutagenesis. The DNA sequence ofpET-3aM in this region, 5′-CATATGG, was converted to 5′-CCCATGG inpBT430.

Plasmid DNA containing a cDNA may be appropriately digested to release anucleic acid fragment encoding the protein. This fragment may then bepurified on a 1% low melting agarose gel. Buffer and agarose contain 10μg/ml ethidium bromide for visualization of the DNA fragment. Thefragment can then be purified from the agarose gel by digestion withGELase® (Epicentre Technologies, Madison, Wis.) according to themanufacturer's instructions, ethanol precipitated, dried and resuspendedin 20 μL of water. Appropriate oligonucleotide adapters may be ligatedto the fragment using T4 DNA ligase (New England Biolabs (NEB), Beverly,Mass.). The fragment containing the ligated adapters can be purifiedfrom the excess adapters using low melting agarose as described above.The vector pBT430 is digested, dephosphorylated with alkalinephosphatase (NEB) and deproteinized with phenol/chloroform as describedabove. The prepared vector pBT430 and fragment can then be ligated at16° C. for 15 hours followed by transformation into DH5 electrocompetentcells (GIBCO BRL). Transformants can be selected on agar platescontaining LB media and 100 μg/mL ampicillin. Transformants containingthe gene encoding the instant polypeptides are then screened for thecorrect orientation with respect to the T7 promoter by restrictionenzyme analysis.

For high level expression, a plasmid clone with the cDNA insert in thecorrect orientation relative to the T7 promoter can be transformed intoE. coli strain BL21(DE3) (Studier et al. (1986) J. Mol. Biol.189:113-130). Cultures are grown in LB medium containing ampicillin (100mg/L) at 25° C. At an optical density at 600 nm of approximately 1, IPTG(isopropylthio-β-galactoside, the inducer) can be added to a finalconcentration of 0.4 mM and incubation can be continued for 3 h at 250.Cells are then harvested by centrifugation and re-suspended in 50 μL of50 mM Tris-HCl at pH 8.0 containing 0.1 mM DTT and 0.2 mM phenylmethylsulfonyl fluoride. A small amount of 1 mm glass beads can be addedand the mixture sonicated 3 times for about 5 seconds each time with amicroprobe sonicator. The mixture is centrifuged and the proteinconcentration of the supernatant determined. One μg of protein from thesoluble fraction of the culture can be separated by SDS-polyacrylamidegel electrophoresis. Gels can be observed for protein bands migrating atthe expected molecular weight.

1. An isolated polynucleotide encoding a functionalfertilization-independent endosperm (FIE) polypeptide, wherein thefull-length sequence of said encoded polypeptide is at least 80%identical to SEQ ID NO: 28, based on BESTFIT, using default parameters.2. The isolated polynucleotide of claim 1, wherein the full-lengthsequence of said encoded polypeptide sequence is at least 90% identicalto SEQ ID NO: 28, based on BESTFIT, using default parameters.
 3. Theisolated polynucleotide of claim 1, wherein the nucleotide sequence isat least 80% identical to SEQ ID NO: 27, based on BESTFIT, using defaultparameters.
 4. The isolated polynucleotide of claim 1, wherein thenucleotide sequence is at least 90% identical to SEQ ID NO: 27, based onBESTFIT, using default parameters.
 5. The isolated polynucleotide ofclaim 1, wherein the nucleotide sequence comprises SEQ ID NO:
 27. 6. Arecombinant expression cassette, comprising the polynucleotide of claim1 or a fragment thereof, which fragment may or may not encode afunctional FIE polypeptide, operably linked to a promoter which directsexpression in the reproductive tissues of a plant.
 7. The recombinantexpression cassette of claim 6, wherein said polynucleotide or fragmentis operably linked in antisense orientation to said promoter.
 8. A hostcell transformed with the recombinant expression cassette of claim
 6. 9.A transgenic plant comprising the recombinant expression cassette ofclaim
 6. 10. The transgenic plant of claim 9, wherein the plant is Zeamays.
 11. A method for producing seed in the absence of fertilization,the method comprising reduction of expression, within a plant, of thepolynucleotide of claim
 1. 12. A method for altering endospermdevelopment in seed, the method comprising modulation of expression,within a plant, of the polynucleotide of claim
 1. 13. A method of plantreproduction comprising embryogenesis from callus tissue derived fromfie germplasm, wherein tissue-specific downregulation of a CHDpolypeptide stimulates embryogenesis and wherein the fie germplasmcomprises endosperm development without fertilization.
 14. A method ofmodulating seed development in a plant, comprising: a) transforming aplant cell with the recombinant expression cassette of claim 6; b)growing said plant cell under conditions which favor plant regeneration;c) regenerating a plant from said transformed plant cell; and d) growingsaid plant under conditions which allow or induce expression of saidpolynucleotide, wherein expression of said polynucleotide results infertilization-independent endosperm development.